Table of contents
1. Reverse target
Find the encryption parameters of the request header through packet capture technology, including cookies, and encrypted or encoded data in the response data, locate the specific interface through xhr/fetch request, and then debug through global search or breakpoint Find out some logic in JS, if not, then perform hooks, js rewriting, automatic simulation, etc. The ultimate goal is to be able to capture the data we need stably, quickly and completely
2. Website analysis
Analyzing the webpage data of a second-hand car, we found that there are still many data fields we want to capture. Once the webpage elements and their positions of the website change, it will be more troublesome to maintain, so we locate the interface through xhr:
3. Analysis of Encryption Parameters
Through the request header parameters of the interface, we can roughly see that some parameters may be more important, such as client-time, verify-token, szlm-id and other parameters. Let’s search for verify-token globally and perform breakpoint debugging
. refresh:
The value of verify-token is 44646d6aa72bc4733ceabefe0e952271. If we refresh it again and find that the value of verify-token has changed again, do we need to find out its JS construction logic ? No, when we don’t know whether certain parameters are useful, we can use some debugging tools such as postman to debug the request header parameters of its interface. After verification, although
verify-token changes dynamically, it does not affect the returned data , access will not be prohibited, so this parameter can be discarded, and other request header parameters are also the same debugging and verification process......
4. Encrypted data analysis
Check the response data and format it:
you can easily see that the numbers have been encoded, we just need to compare them with the numbers on the page:
数字5就是
数字4就是
5. Summary of ideas
It can be seen from this actual combat that many request header parameters are actually useless, even the dynamically changing encrypted parameters. We can use the debugging tool to judge whether it is useful and bypass some things, so as to help us quickly grasp Fetching data to improve crawler efficiency
6. Complete project download
Click me to download the complete project
The project includes the following content:
After reverse research, part of the mind map written using xmind is as follows:
Project structure:
there is a software package of xmind, you can use xmind.exe to open the mind map to view the reverse process
test The first 9 pages of crawling Take effect:
view Excel data:
7. Author Info
Author: Xiaohong's fishing routine, Goal: Make programming more interesting!
Focus on algorithms, reptiles, game development, data analysis, natural language processing, AI, etc., looking forward to your attention, let us grow and code together!
Copyright Note: This article prohibits plagiarism and reprinting, and infringement must be investigated!