Teach you how to use Python to download major V videos on Douyin

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only, and do not have any commercial use. If you have any questions, please contact us for processing.

The following article is from Python No. 7, author somenzz

Python crawler, data analysis, website development and other case tutorial videos are free to watch online

https://space.bilibili.com/523606542

Last time I wrote about the way to download Zhihu videos in batches with Python. This time I share the use of Python to batch download all the watermarkless videos of Douyin’s personal homepage. The focus of this article is not to provide a useful script, but to describe how to write such a script. As the saying goes, teaching people how to fish is worse than teaching people how to fish. The so-called reptiles are basically this routine.

Ideas

Let me talk about the idea first. If you want to download videos in batches, you can first try to download one successfully, make sure there is no watermark, and then write a loop to download in batches.

Difficulty: Downloading one video may be simple, but downloading multiple videos is a little more complicated. You need to grab the URLs corresponding to multiple videos. TikTok has taken anti-crawl measures, allowing only the video list of the personal homepage to be seen on the phone. But you can't see the web page on the computer side, so you need to grab the https package of your mobile phone. Here we use Burpsuite to grab the package.

 

Burpsuite is used here, so I put my commonly used Burpsuite 2.1.06 professional version in the network disk, the public account "Python 7" reply "burp" to get it, and run start_burp.bat or sh start_burp.sh after downloading. One-click start, no need to buy a license, very convenient.

Crawl a single video

  1. Find a TikTok video link, click share, copy the link, open it on your computer, then open the developer tools, and click the network option.
  2. Refresh, look at the interface, and find the interface with the playback address in the return value:

 

There is a play_addr inside, there is a urllist inside, we copy this urllist[0] and open it in the browser, the website jumps to the real playback address, and you can see the download button at the same time:

 

I downloaded this video and found it was watermarked. How to download the video without watermark? After searching online, the method is to change playwm in the above urllist[0] to play.

Then start writing code, get this urllist[0], and download

def get(share_url) -> dict:
    """
    share_url -> 抖音视频分享url
    返回格式 [{'url':'', 'title','format':'',},{}]
    """
    data = []
    headers = {
        'accept': 'application/json',
        'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1'
    }
    api = "https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids={item_id}"

    rep = requests.get(share_url, headers=headers, timeout=10)
    if rep.ok:
        # item_id
        item_id = re.findall(r'video/(\d+)', rep.url)
        if item_id:
            item_id = item_id[0]
            # video info
            rep = requests.get(api.format(item_id=item_id), headers=headers, timeout=10)
            if rep.ok and rep.json()["status_code"] == 0:
                info = rep.json()["item_list"][0]
                tmp = {}
                tmp["title"] = info["desc"]

                #去水印的视频链接
                play_url = info["video"]["play_addr"]["url_list"][0].replace('playwm', 'play')
                tmp["url"] = play_url
                tmp["format"] = 'mp4'
                data.append(tmp)

    return data

if __name__ =='__main__':
    videos = get('https://www.iesdouyin.com/share/video/6920538027345415431/?region=&mid=6920538030852885262&u_code=48&titleType=title&did=0&iid=0')
    for video in videos:
        downloader.download(video['url'],video['title'],video['format'],'./download')

The downloader.download function here is the same as the function in the previously known video download, so the code will not be posted here.

Get the video link of your personal homepage

The first two steps have realized the watermarkless download of a single Douyin video. Now what we have to do is to find a large number of such links and just loop directly.

Open a big V’s personal homepage, share, copy the link, and open it with a browser. You can't see a video, but you can see it with the Douyin App:

 

Browser

 

Douyin APP

It shows that Douyin has made certain restrictions to prevent the information of multiple videos from being seen from the browser. At this time, you need to learn to capture packets from the mobile phone APP, see how the http request on the mobile phone is initiated, and then use the program to simulate.

The BurpSuite (hereinafter referred to as Burp) that I have been using is very easy to use. Here is how to use it by the way:

1. Run Burp

After downloading, run start_burp.bat or sh start_burp.sh to start Burp, then open the proxy settings and bind to the IP of the machine running Burp, as shown in the following figure:

 

Be careful not to set the ip to 127.0.0.1. If this is set, only local requests can use the proxy, and the mobile phone cannot connect to this proxy.

2. Set proxy on mobile phone

The mobile phone and the computer are connected to the same wifi, the operation of the IPhone is as follows: then enter Settings -> Wireless LAN -> click the information symbol on the right of the same wifi, then pull down, click Configure proxy, configure the same ip and port as BurpSuite. The settings for Android phones are similar. So far, the http traffic of the mobile phone can be captured on BurpSuite.

3. Download Burp's certificate on the mobile phone and set the trust

  1. Enter http://burp in the mobile browser.
  2. Click on the CA to download the certificate.
  3. Settings->General->Description File->Click PortSwigger CA->Install
  4. Settings->General->About this machine->Certificate trust settings, turn on BurpSuite's certificate

In this way, the https package initiated on the mobile phone can be grabbed.

4. Set BurpSuite interrupt

 

After this step is set, the request on the mobile phone will be blocked here, you can choose to pass it, or pass it after modifying the data packet, or you can send it to the repeater for subsequent replay requests, so the request from the front-end is not credible.

Now open the Douyin App on your phone, there will be a large number of requests blocked here, we choose to let it go, and you will find that the data in the Douyin App appears step by step. Send the request to Repeater before quickly scrolling to the video on the personal homepage, as shown in the figure below:

 

Then open the Repeater tab of BurpSuite, and you can see the request sent just now. At this time, we choose to replay, look at the data, and decide which interface we need to use, as shown in the following figure:

 

It is found that this interface satisfies the request. Here you can see the interface url, various parameters of headers, and the User-Agent parameter in headers. It is an important identifier to distinguish whether the client is a browser or an App. Therefore, you can write code to simulate the request. And then get the required batch download links.

Since there are many parameters in the url, some are fixed, and some will change with different people’s homepage parameters. If you only use it by yourself, you can simply extract these url links through regular expressions, and then download them in batches. That's it.

If you want to write a script for others to use, then you need to do more work, for example, you need to view more APIs to determine how the parameters in the url and headers are obtained or generated, and then write a script to automate this One process, in some cases, also involves anti-climbing measures such as encryption obfuscation, which will not be expanded here. Interested readers are invited to explore on their own.

Final words

The key to crawling videos is to find the playback address of the video. With the playback address, even if you don’t write code, you can use the browser to download. Finding the playback address is not enough. You must consider whether you can remove the watermark. If you want to download in batches, then Know how to get more video links. When the browser can’t capture it, consider using BurpSuite to capture mobile phone traffic packets, further extract interface data, or simulate mobile phone requests. For students who engage in crawlers, BurpSuite is a Swiss The saber is very practical.

If this article is helpful to you, please like it or read it again. Thank you for your support.

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/113696549