Typical anti-climb mechanism

① verification code processing: Many websites now need to log in, in this process there will be a lot of code, and some simple digital code can be as directly through the binary, gray scale processing, noise reduction, and finally the direct use of some python modules: tessertocr complicated identification verification code, such as: very mortem verification, 12306 and other relatively complex to operate, some machine learning need to complete the identification, not described in detail here;.

② dynamic loading: Some websites in order to enhance the user experience, the entire page dynamic load is achieved here can be simulated browser to run through selenium, finished loading the required content and then obtain the source code pages, so as to achieve the purpose of collection;

③JS encryption: Some sites display the data in the front end before all information is encrypted through, so if we want to get to the data, it is necessary to decrypt the encrypted data, usually a developer would use js to achieve, and we want to do is find this js encryption and decryption code to crack its encryption or encrypted data directly to get passed this code decryption;

④ font file mapping: always pay attention to the page source might find that different data Some sites displayed in the source data and browser rendering, and even the contents of some of the source code is garbled, the lower figure is to do for our company's website anti-climb:

If so, reptile acquired data is wrong or just get garbled. In fact the developer to modify the mapping between font file, this case not only to get the source code at the time of treatment, but also get font files returned by analysis of correspondence between the file and then convert fonts to get the desired results.

 

Guess you like

Origin www.cnblogs.com/wangtaobiu/p/12652628.html