Python crawler: Use JS to reversely grab the download link of the pictures in the comment area of Ctrip.com

Python crawler: Use JS to reversely grab the download link of the pictures in the comment area of ​​Ctrip.com

1 Introduction

There may be copyright issues in the content of the article. For this reason, the editor does not provide relevant implementation codes, but only talks about how to implement this process from the reverse side of js, hoping to help those readers who are doing js reverse related operations and readers who need codes Private message me alone! However, it should be noted that the code is only for learning and cannot be used for commercial activities, please keep in mind. .

2. Implementation process

When it comes to js reverse engineering, it means that the data to be grabbed does not come from a static page, that is to say, if you access this page with requests, you can't get the data you want, so how to get the data? Find the relevant link interface, which usually involves ajax technology. Because you can’t understand the meaning of some related request parameters on some interfaces, so you need to reverse js to understand the specific meaning of these request parameters (of course, some of them can’t understand, but what you can understand is how the parameter value is composed or where you can find it).
Please add a picture description
Since it is to obtain the picture download link in the comment area, of course, you can also get the relevant data of the comment . These data come from this interface, as follows:
Please add a picture description
The request parameter is:
Please add a picture description
you can see that there are two keys in the request parameter, namely arg and head, through The launcher after this interface finds the implementation process of the relevant js code. It can be found that the values ​​corresponding to the middle key of the dictionary corresponding to the key head are all fixed values ​​except cid, and the cid value can also be said to be fixed. Right! (Its value comes from the value of the relevant key in the cookie, as follows:)
Please add a picture description
Please add a picture description
Please add a picture description
As for the key value in the arg dictionary, the pageIndex value matches the number of pages; pageSize is the number of pages; sortType is the sorting method, there are Two kinds of it! One is time sorting, and the other is smart sorting, which is smart sorting by default; poiId should be the id number of the scenic spot (this value can be obtained from a json data inside the script of the current interface). Others can be said to be basically fixed! As follows:
Please add a picture description
The poiId comes from the json data under the script tag! Please add a picture descriptionThis commentTagId parameter value should refer to this! (Not necessarily right!) Please add a picture description
As for

The parameter after the question mark, You can understand its composition principle from this piece of js code , as follows:
Please add a picture description
By comparing with the relevant data in the above picture, readers should be able to find that the t in the js code in the picture is 09031020210426062880 , which is the value of the key guid in the cookie.

3. Running results

Data on page 1
Please add a picture description
Data on page 2
Please add a picture description

I don't know if I can publish it successfully! Therefore, in the above process, some js reverse operations have not been explained in detail, and I hope readers can understand.

Guess you like

Origin blog.csdn.net/qq_45404396/article/details/131679816