urllib the Advanced Usage

Handler Profile

We can understand him for a variety of processors, have to deal with login authentication, there are handles cookies, there are processing proxy settings. Use them, we can do almost everything in the HTTP request.

First of all, tell us about urllib.request module in the BaseHandler class, which is the parent of all other Handler class, it put for the most basic methods, such as default_open (), protocol_request () and so on.

Next, there are various BaseHandler Handler subclass inherits the class, for example as follows.

HTTPDefaultErrorHandler: HTTPError type of exception for processing.

HTTPRedirectHandler: for redirection process.

HTTPCookiesProcessor: for processing cookies.

ProxyHandler: used to set the proxy default proxy is empty.

HTTPpasswordMgr: manage passwords, which maintains a list of user names and passwords.

HTTPBasicAuthHandler: for management certification, if certification requires an understanding of when to open, you can use it to solve the authentication problem.

 

In addition, there are other Handler class, there is not one by one example, the details can refer to the official document: https: //docs.python.org/3/library/urllib.request.html#urllib.request.BaseHandler.

 

About how to use them, do not worry for now, will be behind the examples demonstrate.
Another important category is the Op enerDirector, we can be called Opener. We've used urlopen () This
method, in fact it is for us to provide a urllib Opener.
So why introduce Opener? Because the need to achieve more advanced functions. Before you can use the Request and urlopen ()
is equivalent to the library for you a good package very common request methods, you can use them to complete a basic request, but not the same now,
we need to implement more advanced features, you need a layer of depth configuration, examples using a lower layer to complete the operation, so here
uses Opener.
Opener can open () method returns the type and the urlopen () is exactly the same. So, what is it Handler and off
line it? In short, is the use of Handler to build Opener.

Here we take a look at a few examples of their usage.

 

 

So, if you want to request such a page, how to do it by means of HTTPBasicAuthHandler can be completed, the relevant code?
As follows:

 

from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, build_opener
from urllib.error import URLError
username = username
password =’ password ’
url = ’ http: //localhost:sooo/'
p = HTTPPasswordMgrWithDefaultRealm()
p.add_password(None, url, username , password)
auth_handler = HTTPBasicAuthHandler(p)
opener = build_opener(auth_handler)

try:
    result = opener.open(url)
    html = result. read(). decode (’ utf 8 ’)
    print(html)
except URLError as e:
    print(e.reason)

  

Here HTTPBasicAuthHandler instantiates an object which the object parameters are HTTPPasswordMgrWithDefaultRealm,
which uses add_password () added to it the user name and password, thus establishing a process validation Handler.

 

Next, using this Handler - the build_opener () method of constructing a Opener, Opener in the transmission request
when the equivalent has been successfully verified.
Next, the open Opener () method to open the link, you can complete the verification. To get a result here is to verify
the page after the source content.

# · Agent 
# doing reptiles time, you will inevitably have to use a proxy, if you want to add the agent, you can do so: 
from urllib.error Import UrlError
 from urllib.request Import ProxyHandler, Build opener 
Proxy _handler = ProxyHandler ({ 
'HTTP ' : ' http://127.0.0.1:9743 ', 
' HTTPS ':' HTTPS: //127.0 .0.1: 9743 ' 
}) 
opener = - the build_opener (proxy_handler)
 the try : 
    Response = opener.open (' https://www.baidu .com ' ) 
    Print (response.read () .decode (' UTF-. 8 ' )) 
the except UrlError AS E:
    print(e.reason)

 

 

Here we set up a local proxy, which runs on port 9743.
As used herein, the ProxyHand l er, which is a dictionary, key name is the protocol type (such as HTTP or HTTPS, etc.),
the key is a proxy link, multiple agents may be added.
Then, with this and - the build_opener Handler () method to construct a Opener, after sending the request to.

 


Cookies
Cookies need of treatment related to the Handler.
Let's use an example to see how to get down to the Cookies website, the relevant code is as follows:

 

import http .cookiejar, urllib.request
cookie = http. cookie jar. CookieJar()
handler = urllib . request.HTTPCookieProcessor (cookie)
opener = urllib.request . build opener(handler )
response = opener. open (’ http://www.baidu.com')
for item in cookie:
    print(item.name +”= ”+ i tem.value)

 

 

First, we must declare a CookieJar object. Next, it is necessary to build a use HTTPCookieProcessor
Handler, and finally - the build_opener constructed using Opener () method, performed open () function can be.

BAIDUID=2E6SA683F8A8BA3DF521469DF8EFF1E1 :FG=1
BIDUPSID=2E6SA683F8A8BA3DF521469DF8EFF1E1
H PS PSSID=20987 1421 18282 17949 21122 17001 21227 21189 21161 20927
PST问= 1474900615
BDSVRTM=O
BD HOME=O

 

 

It can be seen here, the state lost the name and value of each Cookie.
But since the energy output, the output to a file format that can you do? We know that Cookies are actually saved in text form
of.
The answer of course is yes, come and see by the following examples:

filename = 'cookies. txt ’
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open (’ http://www .baidu.com')
cookie.save(ignore_discard=True , ignore_expires=True)

Then you need to replace CookieJar ask ozillaCookieJar, it will be used when generating the file is CookieJar subclass can
use to manipulate files Cookies and related events, such as reading and saving Cookies, Cookies can be saved to Mozilla type
Cookies format browser.
After running, you can find a cookies.txt generated document, which reads as follows:

In addition, LWPCookieJar Cookies can also be read and saved, but the save format and MozillaCookieJar not the same,
it will be saved as libwww-perl (LWP) Cookies file format.
Cookies will be saved as LWP format, it can be changed in the statement:
.. The cookie = HTTP CookieJar LWPCookieJar (filename)
generated at this time are as follows:

Seen in this light, the resulting format is still a relatively large difference.
Then, after generating the Cookies file, how to read from the file and use it?
Let's look at an example to LWPCookieJar format:

It can be seen here call the load () method to read local files Cookies, access to the content of Cookies. But before
mentioning it is that we first generate the Cookies LWPCooki eJar format and saved to a file, and then use the same after reading Cookies
build Handler and l Opener way to complete the operation.
If the result of normal operation, Baidu will output the page's source code.
Through the above method, we can achieve the set features a vast majority of requests.
This is the basic usage urllib library request module, indicating that if you want more functionality, you can refer to the official document:
HTTPS:.. // docs the p-ython.org/3/library/urllib request.html # basehandler- objects.

 

Guess you like

Origin www.cnblogs.com/baijinshuo/p/10955460.html