1. get parameter passing
(1) url contains Chinese error solution
urllib.request.quote ( "contains the Chinese url", safe = "string.printtable")
(2) Dictionary of mass participation
The final url parameter and spliced together by the url, but the type of the parameter (the params) is a dictionary, the dictionary To splice into a string parameter type, as follows:
import urllib.request import urllib.parse import string def get_params(): url = "http://www.baidu.com/s?" params = { "wd": "中文", "key": "zhang", "value": "san" } str_params = urllib.parse.urlencode(params) print (str_params) # result of printing here WD =% E4% B8% the AD% E6% 96% 87 & Key = Zhang & value = San FINAL_URL = url + str_params # url with the translated Chinese url of the computer can understand end_url = urllib.parse.quote (FINAL_URL, Safe = String .printable) Print (end_url) Response = the urllib.request.urlopen (end_url) Data = response.read (). decode ( " UTF-. 8 " ) Print (Data) get_params ()
2. post
urllib.request.openurl (url, data = "server receives data")
3. request header:
urlib.request.openurl (), this method can be found View source does not attribute headers, so to have their own definition of a request object, and this object has attribute headers
= urllib.request.Request Request (URL)
request_headers # = Request.Headers acquired information request header
(1) Create a request object: urllib.request.Request (url)
(2) plus User-Agent: simulate real-world browser sends a request (if used for a short time with a User-Agent to send multiple requests, the other server will be able to discern a reptile, it is possible to define a User-Agent pool, from which random Gets agent)
import urllib.request import random def load_baidu(): url = "http://www.baidu.com" user_agent_list = [ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50", "Opera/9.80 (Windows NT 6.1; U; zh-cn) Presto/2.9.168 Version/11.50" ] # Each browser request is not the same random_user_agent = The random.choice (user_agent_list) Request = urllib.request.Request (URL) # corresponding increase request header (user_agent) request.add_header ( " the User-- Agent " , random_user_agent) # request data Response = the urllib.request.urlopen (request) # request information header Print (request.get_header ( " the User-Agent " )) load_baidu ()
(3) request.add_header (dynamically added data head)
(4) response header response.header
Code
urllib.request Import DEF load_baidu (): url = " http://www.baidu.com " header = { # browser version of " the User-Agent " : " Mozilla / 5.0 (Macintosh; Intel Mac OS the X-10_13_2) AppleWebKit /537.36 (KHTML, like the Gecko) the Chrome / 69.0.3497.100 Safari / 537.36 " , " Haha " : " Hehe " # invalid, no such information in the request header, the print request header can also be seen behind this no } # Create request Object request = urllib.request.Request (URL) # Print (request) # Print (Request.Headers) # dynamically add information to the head request.add_header ( " the User-Agent " , " Mozilla / 5.0 (Macintosh; Intel Mac OS the X-10_13_2) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 69.0.3497.100 Safari / 537.36 " ) # request the network data (not here request header is added because this method does not provide system parameters) Response = the urllib.request.urlopen (request) Data = response.read (). decode ( " UTF-. 8 " ) # for complete URL FINAL_URL = request.get_full_url ( ) Print (FINAL_URL) # access to information (all header information) request header request_headers = Request.Headers first method # Print (request_headers) # the second method, pay attention: the first letter capitalized, all other letters in lower case request_headers = request.get_header("User-agent") print(request_headers) with open("02header.html", "w", encoding="utf-8")as f: f.write(data) load_baidu()
4. IP Agent:
When multiple requests to use the same IP short time, the requested server reptile can be found, then we must use different ip request to send, it relates to an IP Proxy
(1) Free IP: poor timeliness, error rate
Pay-IP: your money, but also the failure can not be used
(2) IP classification:
Transparency: We know each other's real IP
Anonymous: we do not know our true ip, but know that you are using a proxy
High hiding: they do not know our true IP, we do not know to use the proxy
5. handler (not clear)
(1) urllib.request.urlopen () method does not add proxy function, this function needs to define ourselves, so talk about the next handler
Code:
urllib.request Import Import urllib.request DEF handler_openner (): urlopen # system does not add proxy functionality so we need to customize this feature Why #urlopen can request data handler processor # own oppener request data # urllib.request. urlopen () url = " https://blog.csdn.net/m0_37499059/article/details/79003731 " # create their own processor Handler = urllib.request.HTTPHandler () # create your own oppener opener = urllib.request. build_opener (Handler) # call the open method opener with a request to create their own data the Response = opener.open (url) # the data = response.read () the data= response.read().decode("utf-8") with open("02header.html", "w", encoding="utf-8")as f: f.write(data) handler_openner()
(2) creates a corresponding processor (Handler)
1. Agent Processor: ProxyHandler
2. Use ProxyHandler created opener:
3. opener.open (url) may request data
Specific code as follows
import urllib.request def proxy_user(): proxy_list = [ {"https":""}, # {"https":"106.75.226.36:808"}, # {"https":"61.135.217.7:80"}, # {"https":"125.70.13.77:8080"}, # {"https":"118.190.95.35:9001"} ] For Proxy in proxy_list: Print (Proxy) # ip to traverse out of creating processor proxy_handler = urllib.request.ProxyHandler (Proxy) # Create opener opener = urllib.request.build_opener (proxy_handler) the try : Data = opener.open ( " http://www.baidu.com " , timeout = . 1 ) haha = data.read () Print (haha) the except Exception AS E: Print (E) proxy_user ()