Version 1
python2 : urllib urllib2
python3: urllib and urllib2 the merger, urllib.request
2 common method
2.1urllib.request.urlopen ( "URL") to initiate a request to the site and get a response
2.1.1 byte stream = response.read ()
字符串 = response.read().decode("utf-8")31
encode (): String ---> bytes
decode (): bytes -> String
2.2 User-Agent reconstruction
2.2.1 does not support the reconstruction of User-Agent: urlopen ()
2.2.2 supports the reconstruction of User-Agent: urllib.request.Request ( "URL", headers = "Dictionary")
User-Agent is the first step in anti-reptile reptiles and struggle, we must send a request with a User-Agent
2.2.2.1 Use Process
2.2.2.1.1 using the method for constructing the request object Request
2.2.2.1.2 using the urlopen () Gets the response object
2.2.2.1.3 by response object read (). Decode ( "utf-8") acquire the content
The method of the response object response 2.2.2.2
2.2.2.2.1 read () reads the content server response
2.2.2.2.2 getcode()
Returned HTTP response effect print (response.getcode ())
200 success
4xx server error page 5xx server error
2.2.2.2.3 geturl()
Role return url actual data (to prevent redirection problem)
3 urllib.parse module
3.1 urlencode (dictionary)
urlencode({"wd":"美女"}) wd=%e%rr.......
3.2 quote (String)
= aseurl "http://www.baidu.com/s?wd="
Key = the INPUT ( "Please enter the content to be searched:")
# with quote () coding
key = urllib.parse.quote(key)
url + = baseurl key
print (url)
Requesting - Response (html source) - Analysis