A, urllib.urlopen
1 urlopen
from the urllib Import Request R & lt = request.urlopen ( ' http://www.baidu.com/ ' ) # Get Status Code Print (r.status) # obtain the appropriate header Print (r.getheaders ()) Print ( ' = ' 30 * ) # acquires the web page source Print (r.read (). decode ( ' UTF-. 8 ' ))
Note: urlopen () containing data (bytes type) is a post request, timeout timeout
2、Request
from the urllib Import Request # Create Object Request REQ = request.Request ( ' https://www.cnblogs.com/ ' ) # open web R & lt = request.urlopen (REQ) Print (r.read (). decode ( ' UTF -8 ' ))
注意:data(bytes,dict->str->bytes),headers={}, method=
Use Handler achieve certification, Cookies, proxy.
二、urllib.error
Handling Exceptions
from urllib.error import URLError, HTTPError
Use try .... except for processing
Note: HTTPError is a subclass of URLRrror
三、urllin.parse
Resolve
the urlparse ()
urlunparse ()
urlsplit ()
urlunsplit ()
urljoin ()
urlencode () # Sequence Hu
parse_qsl () # deserialize the result [( 'name', 'tom '), ( 'age', 24)] may be used dict-> Dictionary format
quote ()
unquote ()
Four, Robot agreement
robotparser resolve robot.txt file