[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes three-network agent

ProxyHandler processor (proxy settings): solve the problem of IP blocking

Many websites will detect the number of visits to a certain IP in a certain period of time (through traffic statistics, system logs, etc.). If there are too many visits, the website will prohibit this IP access.

Therefore, at this time we often need to change to a "small number" to continue to obtain the data we use. This "trumpet" is what I said, agent.

Agency principle:

Before requesting the target website, first request the proxy server, and then let the proxy server request the target website. The proxy server will forward the data to our code after getting the target website data.

http://httpbin.org/  This website can http request some parameters.

Commonly used agents are:

Xici free proxy IP: https://mtop.chinaz.com/site_www.xici.net.co.html

Fast proxy: https://www.kuaidaili.com/

Agent Cloud: http://www.dailiyun.com/

 

Take proxy cloud as an example, use proxy:

Choose a proxy IP from the proxy cloud

from urllib import request

# 没有使用代理
url = 'http://httpbin.org/ip'
resp = request.urlopen(url)
print(resp.read())

#使用代理
url = 'http://httpbin.org/ip'
# 1.使用ProxyHandler创建一个代理handler
handler = request.ProxyHandler({'http': '140.143.6.16:1080'})
# 2.创建opener
opener = request.build_opener(handler)
# 3.使用opener发送一个请求
resp = opener.open(url)
print(resp.read())

The above is the access IP without proxy, and the below is the IP after using the proxy.

 

Guess you like

Origin blog.csdn.net/weixin_44566432/article/details/108552602