urllib basic usage (understanding)

A, urllib.urlopen

1 urlopen  

from the urllib Import Request 

R & lt = request.urlopen ( ' http://www.baidu.com/ ' )
 # Get Status Code 
Print (r.status)
 # obtain the appropriate header 
Print (r.getheaders ())
 Print ( ' = ' 30 * )
 # acquires the web page source 
Print (r.read (). decode ( ' UTF-. 8 ' ))

Note: urlopen () containing data (bytes type) is a post request, timeout timeout

2、Request

from the urllib Import Request
 # Create Object Request 
REQ = request.Request ( ' https://www.cnblogs.com/ ' )
 # open web 
R & lt = request.urlopen (REQ)
 Print (r.read (). decode ( ' UTF -8 ' ))

注意:data(bytes,dict->str->bytes),headers={}, method=

Use Handler achieve certification, Cookies, proxy.

二、urllib.error

Handling Exceptions

from urllib.error import URLError, HTTPError

Use try .... except for processing

Note: HTTPError is a subclass of URLRrror

三、urllin.parse

Resolve

the urlparse () 
urlunparse () 
urlsplit () 
urlunsplit () 
urljoin () 
urlencode () # Sequence Hu 
parse_qsl () # deserialize the result [( 'name', 'tom '), ( 'age', 24)] may be used dict-> Dictionary format quote () unquote ()

Four, Robot agreement

robotparser resolve robot.txt file

 

 

Guess you like

Origin www.cnblogs.com/wt7018/p/11902020.html