A reverse study of a second-hand car turned out to be like this...

1. Reverse target

Find the encryption parameters of the request header through packet capture technology, including cookies, and encrypted or encoded data in the response data, locate the specific interface through xhr/fetch request, and then debug through global search or breakpoint Find out some logic in JS, if not, then perform hooks, js rewriting, automatic simulation, etc. The ultimate goal is to be able to capture the data we need stably, quickly and completely

2. Website analysis

Analyzing the webpage data of a second-hand car, we found that there are still many data fields we want to capture. Once the webpage elements and their positions of the website change, it will be more troublesome to maintain, so we locate the interface through xhr:
insert image description here

3. Analysis of Encryption Parameters

Through the request header parameters of the interface, we can roughly see that some parameters may be more important, such as client-time, verify-token, szlm-id and other parameters. Let’s search for verify-token globally and perform breakpoint debugging
insert image description here

. refresh:
insert image description here

The value of verify-token is 44646d6aa72bc4733ceabefe0e952271. If we refresh it again and find that the value of verify-token has changed again, do we need to find out its JS construction logic ? No, when we don’t know whether certain parameters are useful, we can use some debugging tools such as postman to debug the request header parameters of its interface. After verification, although

verify-token changes dynamically, it does not affect the returned data , access will not be prohibited, so this parameter can be discarded, and other request header parameters are also the same debugging and verification process......

4. Encrypted data analysis

Check the response data and format it:
insert image description here

you can easily see that the numbers have been encoded, we just need to compare them with the numbers on the page:
insert image description here

数字5就是
数字4就是

5. Summary of ideas

It can be seen from this actual combat that many request header parameters are actually useless, even the dynamically changing encrypted parameters. We can use the debugging tool to judge whether it is useful and bypass some things, so as to help us quickly grasp Fetching data to improve crawler efficiency

6. Complete project download

Click me to download the complete project

The project includes the following content:
After reverse research, part of the mind map written using xmind is as follows:
insert image description here

Project structure:
insert image description here
there is a software package of xmind, you can use xmind.exe to open the mind map to view the reverse process


test The first 9 pages of crawling Take effect:
insert image description here

view Excel data:
insert image description here


7. Author Info

Author: Xiaohong's fishing routine, Goal: Make programming more interesting!

Focus on algorithms, reptiles, game development, data analysis, natural language processing, AI, etc., looking forward to your attention, let us grow and code together!

Copyright Note: This article prohibits plagiarism and reprinting, and infringement must be investigated!

Guess you like

Origin blog.csdn.net/qq_44000141/article/details/130903928