Reptile Elementary-TikTok video without watermark batch download

I recently received a software customization request from a friend who needs to crawl all the Douyin videos of the specified user (no watermark). After confirming to accept this task, first of all the ideas for implementation, the plan to be implemented, and the ideas for implementation are divided into three steps. :

  • Analyze user homepage data and get video information
  • Generally, the data is displayed in pages, you need to find the paging mark, and read the video data to the end mark in a loop
  • Get the video data, splicing and parse the video address to realize download without watermark

After determining this idea, according to personal experience, it should not be too difficult to read the video data. The difficulty is how to achieve watermark-free analysis. There are various watermark-free analysis tools on the Internet to download and use, and there must be a watermark-free analysis method. , But there is no source code for the tools on the Internet for our reference. But fortunately, I found a resolution method without watermark in the Jingyi Forum, as shown in the figure below (source: Jingyi Forum -> https://bbs.125.la/thread-14560713-1-1.html ):

After carefully reading the analysis methods of the predecessors, I personally think that the first method is relatively simple and convenient. With the analysis method, then perform the analysis test first, and share a link from the Tik Tok speed version (test link: https ://v.douyin.com/J1DWHbK/ ), note that you need to adjust to mobile mode browsing, use the plug-in User-Agent Switcher that comes with Firefox, set the browser to mobile mode, and adjust the browser interface to the mobile phone size and Visit the test link, the displayed result is shown in the figure below:

You can see that after visiting the test link, the link has undergone a jump. We will analyze the specific parameters in the later. After seeing the display interface of the mobile phone, we can right-click to view the source code and search for "playwm" in the source code (why Search for this keyword, please look carefully at Figure 1), and found that there is no corresponding text, as shown in the following figure:

Can't find the corresponding text, is this method invalid? Actually not, after switching back to the desktop mode, refresh the page, right-click to view the source code, find the corresponding text, this time found it, as shown below:

Find the corresponding text, copy it and open the browser, the video can be played successfully, but there is a watermark, as shown in the following figure:

Can successfully read the video, indicating that the connection is valid, follow the reminder, change playwm to play, visit the link again, the result is blank. . .

Finally, after testing, it was found that the UA should be set to mobile phone mode (be sure to read the document carefully), visited again, and got the playback address without watermark:

OK, test method one can successfully get the resolution address without watermark, the rest is to find a way to get the video data of the user's homepage, and share the user's personal homepage address from Douyin ( https://v.douyin) .com/J1UuQ6X/ ), change the UA to mobile mode, open the developer options, and monitor network requests:

In the user homepage, I found that Douyin obtains the user's video data through the API interface. This is easy to handle. Students with certain development experience should know how to do it. After testing, I summarized the parameters of this interface. The specific meaning of:

  • sec_uid: The encrypted character string of the user ID must be transmitted
  • count: the number of videos read each time
  • max_cursor: the current page label, used to read the next page of data
  • aid: unknown, just follow the pass
  • _signature: verify the signature, this signature is not easy to handle, we can directly use webview to get
  • dytk: As the name implies, the Douyin token can be passed or not passed after testing.

The return data of the interface is shown in the figure below:

After all, Douyin is also a national-level APP, security verification technology, server anti-crawler technology is also TOP1 level, so there are a lot of pits when testing this interface, I also want to talk about it here, the main problem is UA, open a no Trace browser (cannot set UA, no cookies) directly access the connection and return empty data:

Use the POST interface to set the UA and submit a request to the same link (only need to set the UA):

Here we can come to the conclusion: this interface only cares about the UA if the parameters passed in are correct. Cookies do not affect this interface. The UA must be set on the mobile phone, otherwise no data will be returned.

At this point, the theoretical system for realizing the batch download of Douyin without watermark has been perfected, and it is reasonable and feasible. The method of reading the user homepage data and the data of the next page, and the method of downloading without watermark have been found. The rest is to implement the specific The code, here I use easy language and PHP implementation (as for why you need to use PHP, there will be a detailed explanation below), here I only post the main core code.

The first step is the first step. We need to get the data through the interface. As we said earlier, I haven't studied the production method of the _signature parameter thoroughly, but it is directly posted in the URL, and we can monitor it with the Jingyi web browser. The url changes to get this url, the specific code is as follows:

Careful students will find that I have decomposed the secuid and other parameters in the url. This is because we directly use the webpage _access_object of the easy language easy module to read the interface data, the specific reason is unknown, because of the time The problem has not been studied in depth. Since the easy language is not good, try the curl method of PHP, which is why we use PHP (for this kind of problem, after fully considering various factors, I think multi-language cooperation It is a good solution, there is no need to die.) We put the data part of the read interface in PHP and make a relay. The easy language program directly reads this interface and passes in the decomposed parameters (the URL cannot be passed directly, because the URL is With the & character, it will affect the analysis), you can get the data, the PHP code is not difficult to understand, only a little knowledge of curl, the code is as follows:

Basically all the problems encountered here have been solved, the rest is to realize the loop reading of the video data, through the interface splicing to get the download address without watermark, the easy language implementation code is as follows:

Well, there is so much core code. If you summarize it, I don’t want to write it. It feels not difficult. It is necessary to be careful when debugging, try a few more times, find the key points, new handwritten articles, if there is no expression Perfect place, forgive me~

Guess you like

Origin blog.csdn.net/oyo775881/article/details/106322072