1. Load the request of the urllib module
from urllib import request
2. Related functions:
(1) urlopen function: read webpage
- webpage= request.urlopen (url, timeout=1) [Read the webpage, the parameter timeout indicates that it will timeout after 1 second, and it can be skipped when encountering an invalid webpage]
- data=webpage.read () [Read page content]
[The text content of the page content read using webpage.read() is bytes-object, and the print content is b'...']
- data= data.decode ('utf-8') [decode]
[text is bytes-object, convert it to the string text.decode(), the default parameter is empty, the encoding method parameter can also be used, and the format is decode("gb2312"). 】
- pat='<div class="name">(.*?)</div>'
res=re.compile(pat).findall(str(data))【记得str(data)】
[Re.search() cannot be used directly, it needs to be converted to string type before use. res is the content obtained]
(2) urlretrieve function: read the web page and save it locally to become a local web page
- urllib.request.urlretrieve( url , filename=" local file address //1.html" )
(3) urlcleanup() function: Using the urlretrieve function will cause some caches, which can be cleared using it.
- urllib.request.urlcleanup()
(4) info() function: returns some information about the web page.
(5) getcode(): If it returns 200, it means that the crawling is normal
(6) geturl(): returns the web page being crawled
(7) (you can check the llib.request.Request function) post and get requests