from beautiful soup website

Web Scraping TikTok with Python



Web
Scraping TikTok with Python

This tutorial demonstrates web scraping with Python. It is divided into two parts: one for scraping TikTok video URLs and the other for scraping hashtags.

Table of Contents


Introduction to Web Scraping
Web Scraping vs. Web Crawling
Installing BeautifulSoup
Part 1: Scraping for TikTok Video Links
Part 2: Scraping for TikTok Hashtags

Introduction to Web Scraping

Web scraping is the process of extracting data from websites. It
involves sending HTTP requests to a website’s server, receiving the response,
and parsing the HTML content to extract the desired information.

Web scraping can be done manually by a human user, but it is usually
automated using software tools such as web crawlers or bots. These tools can
send requests and parse responses much faster than a human user, allowing large
amounts of data to be extracted in a short amount of time.

Web Scraping vs. Web Crawling

Web scraping and web crawling are related but distinct concepts. Web
crawling refers to the process of automatically navigating through a website by
following links, usually for the purpose of indexing its content. Web scraping,
on the other hand, refers to the extraction of specific data from a website.

In other words, web crawling is about discovering and navigating through
web pages, while web scraping is about extracting data from those pages.

Installing BeautifulSoup

BeautifulSoup is a Python library that makes it easy to parse HTML and
XML documents. It provides methods to search, navigate, and modify the parse
tree.

To install BeautifulSoup, you
can use the 
pip command:

pip install beautifulsoup4

 

This command installs the latest version of BeautifulSoup from the
Python Package Index (PyPI). You can find more information on how to install
BeautifulSoup in the official documentation.

After installing BeautifulSoup, you can import it in your Python code
using the following statement:

from bs4 import BeautifulSoup

 

Part 1: Scraping for TikTok
Video Links

In the first part, the code
imports necessary libraries such as 
urllib.requestrequestsretime, and BeautifulSoup from the bs4 library. Then, it opens a
file named 
tiktoknaillinks.txt in write mode with utf-8
encoding.

The code then reads a file
named 
tiktokmainlinks.txt line by line. For each
line, it treats the line as a URL and uses 
urllib.request.urlopen(url).read() to get the HTML content of the page. The HTML
content is then parsed using 
BeautifulSoup with the html.parser.

The code then uses the find_all method of the soup object to find all strong tags with a specific class name (tiktok-23vhki-StrongText ejg0rhn2). This class name is obtained by right-clicking on
the video and looking for the tag 
strong that has the link of the
video as text. If you are confused, you might have to watch YouTube videos on
web scraping.

For each link found, the code
writes its text to the file 
tiktoknaillinks.txt. After all links are processed,
the file is closed and a message “done” is printed.

Here is a detailed explanation of each line of code in this part:

import urllib.request

import requests

import re

import time

from bs4 import BeautifulSoup

 

These lines import necessary
libraries for web scraping such as 
urllib.requestrequestsretime, and BeautifulSoup from the bs4 library.

f = open(‘tiktoknaillinks.txt’,’w’,encoding =”utf-8″)

 

This line opens a file
named 
tiktoknaillinks.txt in write mode with utf-8
encoding. The file will be used to store links to TikTok videos.

with open(‘tiktokmainlinks.txt’,’r’) as fa:

    for line in fa:

        url = line

        html =
urllib.request.urlopen(url).read()

        soup = BeautifulSoup(html,
‘html.parser’)

        links = soup.find_all(‘strong’,class_=’tiktok-23vhki-StrongText
ejg0rhn2′)

 

These lines read a file
named 
tiktokmainlinks.txt line by line. For each
line, it treats the line as a URL and uses 
urllib.request.urlopen(url).read() to get the HTML content of the page. The HTML
content is then parsed using 
BeautifulSoup with the html.parser.

The code then uses the find_all method of the soup object to find all strong tags with a specific class name (tiktok-23vhki-StrongText ejg0rhn2). This class name is obtained by right-clicking on
the video and looking for the tag 
strong that has the link of the
video as text. If you are confused, you might have to watch YouTube videos on
web scraping.

for i in links:

    f.write(i.text)

    f.write(‘\n’)

 

These lines iterate over each
link found and write its text to the file 
tiktoknaillinks.txt. A newline character is also written after each link.

f.close()

print(‘done’)

 

These lines close the file and print a message “done” to indicate that
the process is complete.

Part 2: Scraping for TikTok
Hashtags

In the second part, the code
again imports necessary libraries and opens a file named 
tiktokmainlinks.txt in write mode with utf-8 encoding. It then defines a URL for a
TikTok hashtag page and uses 
urllib.request.urlopen(url).read() to get its HTML content.

The HTML content is again parsed
using 
BeautifulSoup with the html.parser. The code then uses the find_all method of the soup object
to find all 
div tags with a specific class
name (
tiktok-yvmafn-DivVideoFeedV2
ecyq5ls0
). This
class name is obtained by right-clicking on the video and looking for the
tag 
div that has the hashtag of
the video as text. If you are confused, you might have to watch YouTube videos
on web scraping.

For each link found, the code
finds all 
a tags with an attribute href that
matches a regular expression for URLs starting with “https:”. For each matching
tag, it gets its href attribute value and writes it to the file 
tiktokmainlinks.txt. After all links are processed, the file is closed and a message “done”
is printed.

Here is a detailed explanation of each line of code in this part:

import urllib.request

import requests

import re

import time

from bs4 import BeautifulSoup

 

These lines again import
necessary libraries for web scraping such as 
urllib.requestrequestsretime, and BeautifulSoup from the bs4 library.

f = open(‘tiktokmainlinks.txt’,’w’,encoding =”utf-8″)

 

This line opens a file
named 
tiktokmainlinks.txt in write mode with utf-8
encoding. The file will be used to store links to TikTok hashtags.

url = “https://www.tiktok.com/tag/nailart”

html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, ‘html.parser’)

link = soup.find_all(‘div’,{‘class’:’tiktok-yvmafn-DivVideoFeedV2
ecyq5ls0′})

 

These lines define a URL for a
TikTok hashtag page and use 
urllib.request.urlopen(url).read() to get its HTML content.
The HTML content is then parsed using 
BeautifulSoup with the html.parser.

The code then uses the find_all method of the soup object to find all div tags
with a specific class name (
tiktok-yvmafn-DivVideoFeedV2
ecyq5ls0
). This
class name is obtained by right-clicking on the video and looking for the
tag 
div that has the hashtag of
the video as text. If you are confused, you might have to watch YouTube videos
on web scraping.

for i in link:

    iref = i.find_all(‘a’,attrs ={‘href’:re.compile(“https:”)})

    for t in iref :

        f.write(t.get(‘href’)

        f.write(‘\n’)

 

These lines iterate over each
link found and find all 
a tags with an
attribute 
href that matches a regular
expression for URLs starting with “https:”. For each matching tag, it gets its
href attribute value and writes it to the file 
tiktokmainlinks.txt. A newline character is also written after each link.

f.close()

print(‘done’)

 

These lines again close the file and print a message “done” to indicate
that the process is complete.

In summary, this code demonstrates how to use web scraping techniques in
Python to extract information from web pages. In this case, it extracts links
to TikTok videos and their associated hashtags. <

 

 Tiktokscrapmain:

import
urllib.request

import
requests

import
re

import
time

from
bs4 import BeautifulSoup

f
= open(‘tiktoknaillinks.txt’,’w’,encoding =”utf-8″)

with
open(‘tiktokmainlinks.txt’,’r’) as fa:

 
  for line in fa:

 
      url = line

 
      html = urllib.request.urlopen(url).read()

 
      soup = BeautifulSoup(html, ‘html.parser’)

 
      links =
soup.find_all(‘strong’,class_=’tiktok-23vhki-StrongText ejg0rhn2′)
/* get the class by right clicking
on the video,look for the tag strong ,that has the link of the video as text,
if you confused , you might have to watch youtube videos on webscraping*/

 
      for i in links:

 
          f.write(i.text)

 
          f.write(‘\n’)

 
  f.close()

 
  print(‘done’)

 
 

 

           

 

tiktok
scrape video links

 

import
urllib.request

import
requests

import
re

import
time

from
bs4 import BeautifulSoup

f
= open(‘tiktokmainlinks.txt’,’w’,encoding =”utf-8″)

url
= “https://www.tiktok.com/tag/nailart&#8221;

html
= urllib.request.urlopen(url).read()

soup
= BeautifulSoup(html, ‘html.parser’)

link
= soup.find_all(‘div’,{‘class’:’tiktok-yvmafn-DivVideoFeedV2 ecyq5ls0′})

/* get the class by right clicking
on the video,look for the tag div ,that has the hashtag of the video as text ,
if you confused , you might have to watch youtube videos on webscraping*/

 
      for i in links:

 

for
i in link:

 
  iref = i.find_all(‘a’,attrs ={‘href’:re.compile(“https:”)})

 
  for t in iref :

 
      f.write(t.get(‘href’)

 
      f.write(‘\n’)

f.close()

print(‘done’)

 

 

I did two web scraping , one to scrape
for the links of the video and the other to scrape for the hashtags.

 

Leave a comment

Hey!

Hello, I’m Maryclare. Dive into the world of technology with my Udemy courses, “Master C++ with Arduino/CNN” and “100 Projects with Web Scraping”. Join over 3000+ students in mastering tech skills that matter. Let’s explore, learn, and create together.

Join the club

Stay updated with our latest tips and other news by joining our newsletter.

Design a site like this with WordPress.com
Get started