Common anti reptiles way

1, JS wrote cookie

  When we write reptiles crawl a page of data inside, nothing more than hair open page, and then view the source code, if there are html data we want it, so it is simple, and direct requests can be obtained by the request URL web page source code, and then parse the contents to extract what we want on it

  Web page requests get is a pair of JS, with the browser opens the page to see the source code is completely different, and when this happens, the browser is often generated by running this JS one or more cookie , and then after a second request to do with this cookie

  In fact, you can see this in the browser process, first of all is to save the browser delete the cookie, then refresh the page is accessed again, we can see the request in the history of the network's first visit to see the record returns 521, the content is a piece of JS code; visit again when you can get a real page. Analysis of these two requests, and found more than a second request with a cookie, but the cookie is not the first server to send requests in the past, but were generated using the JS 

    Solution:

      Research that period JS, cookie generation algorithm to find it, then you can use python code to solve

 

2, JS encryption parameter request ajax

  When you want to crawl a page of data and found that page source code, there are no real data we want, that this time the ajax request data is often obtained. This time can be analyzed through the inside of the XHR response, data is generally that which we want

  Of course, this URL which will include many parameters, one of which may not look very eyes, but the string may be obtained by a JS encryption algorithm, the server will be verified by the same algorithm, and we have to think you're verified by the request is sent from the browser. We can copy the URL into the address bar by adding, to change the parameters casual look, look at the visit, the correct result is not what we want, in order to test whether it is encrypted parameters

    Solution

      For such encryption parameters, the response is to locate corresponding JS encryption algorithm through debug JS. One of the most critical is set XHR / fetch Breakpoints in the browser

 

3, JS anti-debugging (anti-debug)

  We will use chrome in front of the loading process F12 to view web pages, when this approach adds anti-debugging strategy with more sites, as long as we open the F12, will be suspended in a 'debugger' lines of code, No matter how all jump out, no matter how many times we click continue to run, he has been there, every time one more VMxx tag, observe 'call Stack' found though she was a recursive call function. The 'debugger' Let us not debug JS. But as long as F12 Close the window, the pages can be a normal load

  JS solution to this anti-debugging we call 'anti - anti-debugging', its function is to find a strategy to take us into an endless loop through the 'Call Stack', and redefine it

  Such a function is almost no other function, just give us a trap set by Bale. We can put this function in the console redefine it, such as the definition of its functions into the air, so it will run when nothing is done, it will not take us to the infinite recursion scenes go, make a Breakpoint in place this function call. As we have in the trap, so to refresh the page, JS run should stop at the breakpoint in the set, then the function is not already running, we will have to redefine finished in the console, it will continue to run skipped the trap

 

4, JS sends mouse click events

  Some sites, you can normally open a page from the browser, and in the requests which he was asked to enter a verification code or redirect to other pages.

  JS responds link is clicked on the link. Click on the link server receives the request, before'll see if it has been sent the information came through a file, if made after that it is the legitimate browser access, given normal web content

  Because there is no mouse incident response process requests will be no access to the file, directly access the link, the server refused to service

  When understand this process in the future, we almost can not study the contents of the JS (JS also likely to be the link to be modified) can bypass anti-climb policy, and nothing more than in previous visits link to access it that file on it . The key is to modify the parameters of the back of the document, these parameters are put on OK

 

 

 

to sum up:

  Reptiles and website are allelopathy when reptiles known anti-climb policy can be made anti - anti-climb policy; website know the reptile anti - anti-climb policy can do an anti - anti - anti-climb policy, it is described as a Road Bearing in mind that the struggle between the two is never-ending

 

Guess you like

Origin www.cnblogs.com/tulintao/p/11616640.html