Teach you how to capture packages in Python ~ [abnormal detailed version]

Hello everyone! I am a red panda ❤

Many friends asked me in the background:

How to find the data source and how to capture the packet?

It's actually very simple, just do it a few times and remember it~~

Today we will demonstrate through three cases

Is there any python-related error answer that I can't answer, or source code information/module installation/Women's clothing bosses are proficient in skillsYou can come here: ( https://jq.qq.com/?_wv=1027&k=2Q3YTfym ) or +V: python10010 ask me


insert image description here

A tooth live capture

First we go to the landing page,

Find a video and analyze it through developer tools.

First pressF12Or right-click and select Inspect, open the developer tools,
and selectnetwork (network panel)AII(all)

Thenrefresh pageto reload the current page content.

Please add image description
It was possible to directly selectmedia(media files) can be seen,

Not anymore, so,Internet update iteration is very fast, the website is frequently updated,

Technology also needs to be ready to update, we also can not stop learning, once we stop, we will fall behind.

Please add image description
But there are too many refreshed data, how do we determine which one is our target?

Taking the current tooth as an example, the video is changed tom3u8 format,

It will divide the complete video into many video clips,

These ts files are all video clips in m3u8 format.

Please add image description
we putURLCopy to new window opens,

It just downloads the clip directly.

Please add image description
Our full video is 2.26 points, but each segment is only a few seconds long.

Please add image description
Then calculate it, an average of five seconds, 2.26 minutes, almost 17 videos, and you have to manually merge it yourself, which is very troublesome.

But it has a special m3u8 format file that saves all the ts file content.

We directly click on the search box in the upper left corner,

direct searchm3u8, and then see a file starting with get,

click it, click againpreview(preview data),

The title of the video and other information can be seen.

Please add image description
find thisdefinitionsExpand, the video of m3u8 is inside, original painting, ultra-clear and smooth.

As you can see, it also has the complete url address here,can be used directly.

Please add image description

Let's make a note~~

Please add image description

This is how to find data for live video

Check out the Weibo video next


insert image description here

A blog captures packets

The first step is a detailed introduction, and I won't do too many screenshots to show them one by one.

It only shows the general process, so if you forget it, it is recommended to look at the first step.

Of course, I will take screenshots of the differences between the two websites.

Determine the destination URL and open a video playback page.

Please add image description
Please add image description
It's a bit too revealing (I don't know if the platform will let me release it, if you don't see it...you know it all...)

For most websites,

His first packet is the current web page.

Except for a few special URLs that are different.

Please add image description
Today's goal is that the video is not necessarily in the source code of the first web page. Even if the URL can be copied, it is still not necessarily there.

Because there is some data, it will be loaded dynamically, or not in the same data package.

In the second method, we copy the title of the current target to the search box and press Enter. Of course, it may not be found.

The third method, we directly click on the fetch/XHR dynamic data capture package, which is loaded in real time.

Please add image description
I will really thank you, the video is gone again...

Forget it, I'll find a new one.

We can see that there is so much data on the left, so who do we need?

Please add image description
At this time, we need to point one by one, usually these two, sometimes not necessarily, so we need one point by one.

Please add image description
Then click on the inverted triangle on the right, expand one by one, pull down, find this urls, and you can see the video address, all of which are of different resolutions.

Please add image description
Then the video's ID, title, etc. are all here.

Why do you want to order them one by one? You see, these two look the same. I just ordered the first one, now look at the second one.

Here are the cover, title, video id, etc. of the recommendation column on the right.

Please add image description
insert image description here


A hand short video capture

Next comes a certain hand, this time let's be serious and find a serious video to demonstrate.

We go directly to the home page (The reason why the platform can't post pictures... Anyone who understands it)

Still the same operation, open the developer tools, click on network, refresh, and select AII.

This time, let's directly copy the name of this young lady and search

After searching, there are two identical options, we need to click to open them one by one to determine which one we need.

One is the blogger's ID profile, etc., and the other is the video data.

Here I directly click on the first graphql → preview. There are a total of 21 videos. You can see that the protourl at the bottom of the picture below is the url of the video, and the photoH265Url is the url of the audio.

Please add image description
insert image description here
Well, that's it for today~

I've been feeling very tired lately...

I'm Little Panda, see you in the next article ❤

Please add image description

Guess you like

Origin blog.csdn.net/m0_67575344/article/details/126678449