Simple use of urllib

You first need to guide packet
Import urllib.request

This is a simple site
for example:
a request to initiate
the Response = urlib.request.urlopen ( "http://www.baidu.com/")
Print (of the type (the Response))

What value
1. getcode () Gets the status code
2. Geturl () acquired url (acquired URL)
3. getHeaders () Gets the header information
4. read () to read the full text (binary text so it reads needs to decode)
5. encoded encode ---> byte
6. the decoder decode ----> text
7. the decoding method gbk utf-8 gb2312 (decoding need to find in Meta)
8. the file is written
9. With open ( "Baidu.com", "W", encoding = "UTF-. 8") AS F:
. f.write (response.read () decode ( "UTF-. 8"))
10. the urlretrieve directly read the content stored in the local (can request web images audio) (currently need to find their own path on a web page)
11. rsplit () [- 10-5] (from the right slice)
construction request
1. If direct access would expose their access address (the User-Agent)
2. customize the User-Agent to write a dictionary first (the page you need to get access to this Agent-the User)
3. Req = Urllib.request.Request (url = variable names, headers = variable name)
4 . Response = Urllib.request.urlopen (req) (returns a response)
browser
The browser will automatically decode encoded (so the browser can also access the Chinese)
2. If you need to pass parameters you need to own characters encoded (tool.chinaz.com/tools/urlencode.aspx) to
3 when decoding need to know is a three-byte characters
4. encoding format
5. urllib.parse.urlencode () (coding for what to write in brackets)
6. the first need the original page? F and then performing all decoding operations in both routing spliced together to ah
reptile page
1. page if you want to climb, then take the data you need to analyze the situation according to the law of the pages of
2. For example:
the For Page in the Range ( . 1,. 1 + PAG):
Pn = (. 1-Page) * 50
full_url URL3 + = "&% S = PN"% PN

 

 

Guess you like

Origin www.cnblogs.com/liuxiaomo/p/11967018.html