Anti-climb for beginners

A, pocketing strategy
1, is determined by the user-agent is not reptiles.
The solution: camouflage user-agent client identification
2, to judge by the frequency of visits.
The solution: Set the frequency of requests.

time.sleep(random.randint(0,5))

3, sealing ip
set the proxy ip.

requests.get(
url,
headers,
params,
proxies, agents dictionary
)
proxies={
'http':'http://ip:port',
}

4, page content not directly rendered, but by js dynamic loading obtained.
Solution: selenium + phantomjs to resolve.
Two, html page technical
1, js:
HTML page skeleton is, css decorative, js is the behavior of the page.
js is very important.
2, jquery: js library js role is to simplify programming.
3, ajax: web asynchronous request technology.
Asynchronous request:
synchronous request:
4, Dhtml
three, selenium and PhantomJS
1, What is selenium?
selenium is a web automated testing tool. But it does not itself with browser functionality. He is actually the tool can be used as external application of some drivers, like, you can control an external application to complete some tasks.
2, selenium installation:
PIP install the Selenium == 2.48.0
3, what is phantomjs?
phantomjs it is actually a built-in browser interface without browser engine. He can load the page like a web browser, running page js code.
4. Why is a combination of selenium and phantomjs can solve the problem of data page of any site?
selenium is like a python program, phantomjs the equivalent of a browser. Their combined equivalent in python to control the browser parses the page content, so long as they can be loaded in the browser page, we will get through this combination to the data.
5, phantomjs installation.
Search phantomjs mirror with a mirror download faster.

2.1.1-PhantomJS the windows.zip how to use this package: phantomjs.exe found in the inside, this exe file into the anaconda package of scripts below.
Test whether installed: Enter phantomjs in cmd window below, if not being given, then the installation is successful.

6. Installation visual browser chrome drive.

Download and install chromedriver.exe ----> this drive is that it allows Google selenium drive a drive. ----> download when we must control their own version of the browser chrome.
Download: search chromedriver mirror chromedriver_win32.zip, find chromedriver.exe inside, he also hand anaconda package of scripts below.

7, selenium use

Documentation: selenuim Common Methods .note
link: http: //note.youdao.com/noteshare id = 0142a95cf23fadbaea95809ccb5674b2 & sub = 02896A50836E4995997A821419D9A063?

Guess you like

Origin www.cnblogs.com/bug-king/p/11980194.html