Several complex anti-crawler strategies and countermeasures

Since the continuous development of the Internet, the war between reptiles and anti-reptiles has never stopped. Today, Tianqi IP will share with you a few complex anti-crawler strategies, let’s take a look~
Insert picture description here

(1) Data camouflage

On the web page, the crawler can monitor the traffic and then simulate the normal request of the user. In this case, some websites will disguise the data to increase the complexity. For example, the price displayed on a website is 299 yuan, and some disguise is made with CSS in the DOM tree. To get the correct value, you must perform some calculations on the CSS rules. In this case, you must be very careful when using crawlers, because it is very likely that after the target website is revised, the rules have changed, and the captured data will be invalid.

(2) Parameter signature

The APP calculates the requested parameters through an encryption algorithm to obtain a signature. This signature is usually associated with a timestamp, and the timestamp is appended to the request. When the requested parameters are fixed, they can take effect within a short period of time. After the request is sent to the server, the server verifies the parameters and timestamp, and compares whether the signatures are consistent. If they are inconsistent, it is judged as an illegal request. It is generally difficult to obtain the encryption algorithm on the APP side, and it usually needs to be decompiled to obtain the encryption algorithm.

(3) Hidden verification

One of the more complex anti-crawler methods is to hide verification. For example, in the protection of websites, by requesting some special URLs through JavaScript, some specific tokens can be obtained, so that different tokens can be generated each time a request is made. Some websites even add some special request parameters to the invisible pictures to identify whether it is a real browser user. In this case, it is usually not feasible or very difficult to directly obtain the API to make a request. You can only simulate the user's behavior through tools such as Chrome Headless to avoid this situation.

(4) Prevent debugging

There is a special kind of anti-crawler strategy. Once the browser's console interface is opened, the browser's debugger command will be triggered indefinitely. The website adds the debugger keyword to all constructors in a file named leonid-tq-jq-v3-min.js, which causes the debugger to be triggered when any object is generated. The purpose of this is to prevent accidental scripts or programs from tracking and debugging, thereby protecting the code. In this case, you can build a modified js file, remove the debugger keyword, use mitmproxy to forward traffic and intercept leonid-tq-jq-v3-min.js, and return the modified js file to the browser to bypass Over this limit.

Guess you like

Origin blog.csdn.net/tianqiIP/article/details/111981660