Serial 3-Post crawler Python parsing, Request class

One, two methods of access to the network

1.get: pass information to the server using the parameter; parameter dict, parse and decode

2.post: General parameters transmitted to the server; the information is automatically encrypted post processing; If you want to use post information, parameter data required to

3.Content-Type:application/x-www.form-urlencode

4.Content-Length: length of data

5. In short, once the change request method, please note that other requests header information adapted

6.urllib.parse.urlencode string may be automatically converted to the above information.

Case: Using parse request analysis module simulation post Baidu translation: analysis steps:

(1) Open the Google browser, F12

(2) attempts to enter the word girl, each of the hair like a letter will have a tap request

(3) request address is: http: //fanyi.baidu.com/sug

(4) open network-XHR-sug

 

from the urllib Import Request, the parse 

# handles module json format 

Import json 

"" " 

general process: 

(1) using the data structure, and then urlopen opening 

(2) returns the result a json format 

(3) the result should be the girl's DEFINITIONS 

"" " 

BaseURL = " https://fanyi.baidu.com/sug " 

# storage usher analog form data format must be dict 

data = { 

    # Girl is the English translation of the content of the input, should be entered by the user, this using the hard-coded 

    " kW " : " Girl " 

} 

# need to parse module for encoding data 

data = parse.urlencode (data) .encode ( "utf-8")

# We need to construct a request header, the request header should contain at least the length of the incoming data 

# Request required incoming request header is a dict format 

headers = { 

    # because of the use of the POST, should include at least content-length field 

    " Content- length " : len (Data) 

 

} 

# With headers, data, url can try to request a 

RSP = request.urlopen (BaseURL, Data = Data) # , headers = headers 

json_data = rsp.read () decode (). 

Print (json_data) 

 

# the converted string dictionary json 

json_data = json.loads (json_data) 

Print (json_data) 

 

for Item in json_data [ " Data"]:

    print(item["k"],"--",item["v"])

Second, the setting request for more information, has a simple function by urlopen not easy to use; class need to use request.Request

Here only modify part of the code, other codes are the same, you can still get the same results.

 

# Configuration example of a Request, borrowing from this class, can be passed to the header information of encapsulated extended 

REQ = request.Request (= BaseURL URL, Data = Data, headers = headers) 

 

# With headers, data, url on may attempt to request the 

RSP = request.urlopen (REQ) # , headers = headers

 

 

Third, the source

Reptile3_PostAnlysis.py

https://github.com/ruigege66/PythonReptile/blob/master/Reptile3_PostAnlysis.py

2.CSDN:https://blog.csdn.net/weixin_44630050

3. Park blog: https: //www.cnblogs.com/ruigege0000/

4. Welcomes the focus on micro-channel public number: Fourier transform public personal number, only for learning exchanges, backstage reply "gifts" to get big data learning materials

 

Guess you like

Origin www.cnblogs.com/ruigege0000/p/12203832.html