Python Fiddler DingTalk PC end group playback video crawling

Python Fiddler DingTalk PC end group playback video crawling

The playback videos in the DingTalk group are set by the group administrator to not be able to download, but sometimes these videos are needed to facilitate circulation and open the speed adjustment, etc., and you can see the video directly without opening the PC terminal. , But how to download it?
Insert picture description here
1. Fiddler captures packets
1. First of all, you must understand a truth, as long as you can see and hear the data on your computer, these data should have existed on your computer, that is, these data can be accessed and crawled. This is "Climb when you see it" in crawlers. But as for whether to climb or not to get data, it depends on the method and technology.

Fiddler is also called "the violin", quoting Baidu Encyclopedia:
"Fiddler is an http protocol debugging proxy tool, it can record and check all the http communication between your computer and the Internet, set breakpoints, and view all" In and out of "Fiddler data (referring to cookies, html, js, css, etc.). Fiddler is simpler than other network debuggers because it not only exposes http communication but also provides a user-friendly format."

Fiddler is a widely used packet capture tool. If you want to download it, go to the Internet and search the official website to download. For configuration and solution of some problems, please refer to this article: Fiddler in the web crawler grabs PC-side webpage data packets and mobile APP data packets

2. To start capturing packets, first open the PC terminal of
DingTalk, and then open Fiddler. I hope you have configured some Fiddler settings before this step, otherwise packet capture errors are also possible.
Then just click on the video to be downloaded and wait for Fiddler to capture the package.
Insert picture description here
Pay attention to the capture button in the lower left corner and keep it in this state.
If the Fiddler interface is too complicated, you can clear the session, right-click the pop-up menu, follow the icon to select clear session, and then click on the video again to capture the packet.
Insert picture description here
Ethereal result will be shown below, the red box is the video link we need
Insert picture description here
our right to return the contents of this linked video file, then copy the link URL, as shown below
Insert picture description here
you the entire link into browser To verify it, if it is correct, a ts file will be downloaded. This ts file is a small fragment of the video. It is necessary to mention that most of the videos in MP4 and other formats are transferred to our computer in segments during the transmission process. Several ts files form a whole MP4 file, which is why we see "buffering" when we watch the video. The problem.
Later we will use a loop to download each ts file. Before that, you should turn off the "capture" button mentioned earlier, because the existence of Fiddler will change the network protocol, causing requests in the crawler to be unable to access the web page.

2. Crawling of ts video clips
Observing the links we obtained, we will find that there is an item of 1.ts in the link, because it is a ts file of the same video, that is to say, the entire link is unchanged, only the change is The number in front of .ts is nothing more than.

import requests
import os

headers = {
    
    
    'User-Agent': # 自行添加
}
path = "" #文件夹名称
os.makedirs(path, exist_ok=True) # 自动创建文件夹
index = 1
while True:
    url = 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/{
    
    }.ts?auth_key=\
    xxxxxxxxx-x-x-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'.format(index)
    res = requests.get(url, headers=headers)
    if len(res.content) < 500: # 当爬取到的ts文件无效时退出循环,可以自己按情况调整大小,
    #这里我爬取的ts视频大多是31秒左右,大小在500K到1.5M之间,故我设置大小在500K一下为无效文件
        break
        
    # 将视频片段都放置到一个文件夹方便后期拼接
    with open(path + "/%04d.ts" % index, "wb")as f:
    #这里注意文件名的命名,因为我爬取的ts片段不会超过三位数,所以我将其命名为4位数的ts文件
    #这么做是为了方便后面拼接时视频顺序不要出错
        f.write(res.content)
    index += 1

3. Video splicing
Enter cmd in the corresponding box of the video folder, and then press Enter to directly enter the current folder directory.
Insert picture description here
Enter the command in the figure below to splicing.
Insert picture description here
Copy refers to copying files, then *.ts represents all files in this folder, and then the new file is saved as new.mp4, where new.mp4 can be changed by itself. The command format is copy+space+/b+space+*.ts+space+file name (eg: new.mp4)

You're done, the download may be slower, just like our usual downloading video speed, it may be a little faster, after all, the video link is directly accessed, and multi-threading can be added to improve the download speed.
Like it if you like it!

Guess you like

Origin blog.csdn.net/weixin_43594279/article/details/107444501