The next day reptiles

Use requests module

Import module requests

import requests

 Written request url

base_url = "http://www.baidu.com/"

 Written request header

= headers {
 # inside with parameters 
}

 

Write parameters

= the params {
 # discharge parameters 
}

Get a response

# response = requests.get(base_url, headers=headers,params=params)
response = requests.post(base_url, headers=headers, params=params)

Attribute response object

  Response server comprising: a status line (protocol status code), in response to the first, a blank line, the response body

    (1) a response body:

String format: response.text

bytes Type: response.content

(2) Status Code: response.status_code

 

(3) response header: response.headers (dictionary)

response.headers['cookie']

 

(4) in response to the encoded text: response.encoding

response.text acquired string type of the response body, in fact, is obtained by the following steps:

response.text = response.content.decode(response.encoding)

 

(5) garbled solution to the problem:

The reason: the encoding and decoding of encoded format inconsistencies caused.

str.encode ( 'coding') --- string into bytes specified codec type

bytes.decode ( 'coding') --- the specified bytes to a string type coding.

a, response.content.decode ( 'page correct encoding format')

<meta http-equiv="content-type" content="text/html;charset=utf-8">

b, find the correct coding, set in the response.encoding

response.encoding = correct encoding

response.text ---> right page content.

 

get request Project Summary:

a, a case where there is no request parameters, and url only necessary to determine the headers dictionary.

b, get request is a request parameter.

In the browser chrome, below find query_string_params, encapsulated inside the parameters of the params dictionary.

c, is to look at each page the main page, the page request parameter change field, to find variation, it can be done with a loop for paging.

 

 post request:

requests.post(

url = request url,

request header headers = dictionary,

data = data request dictionary

timeout = timeout length

 

) --- response objects

post requests return data are generally json data.

# Parse json data methods:

(1) response.json () ---> json string corresponding to the list or python dict

(2) using the json module.

json.loads (json_str) ----> json_data (python dict or a list)

json.dumps(json_data)--->json_str

 

post request success, the key request parameter.

How to find which request parameters affecting the data acquisition? ---> By contrast, find the parameter change.

How to change parameters to find ways to generate parameter, it is to solve this way ajax request data acquisition.

  There are several ways to look for:

      (1) Write died in the page.

      (2) written in the js.

      (3) the request is to obtain a good parameter data previously requested a ajax inside in advance.

 

Acting use.

(1) Acting basic principles:

    The image of the agency, said he was a transit point for network information. In fact, between the machine and the server bridge has been built.

 

  (2) the role of the agent:

    a, break their own ip access a reality, you can visit some sites not usually visit.

    b, access to resources, some units or groups.

    c, to improve access speed. The main role of the proxy server is transit, so the general agent services which are memory used for data storage.

    d, hide ip.

  (3) Agent Category:

    1, divided according to the agreement:

      FTP proxy server --- 21,2121

      HTTP proxy server --- 80, 8080

      SSL / TLS Proxy: mainly used for access to encrypted website. Port: 443

      telnet proxy: mainly used telnet remote control, port generally 23

    2, in accordance with the degree of anonymity:

      Highly anonymous proxy: intact packets will be converted, in the service segment seems as if a normal user access, to completely hide ip.

      Ordinary anonymous proxy: packets will make some changes, it is possible to find the original server ip.

      Transparent Proxy: not only to change the data, but also tells the service, who visit.

      Spy Agent: refers to organizations or individuals for recording user data transmission, then research, monitoring agency purposes.

  (4) How to set proxy requests module?

     proxies = {

    'Type of proxy server': 'proxy ip'

    }

    response = requests.get(proxies = proxies)

    Proxy server types: http, https, ftp

    Proxy ip: http: // ip: port

 

Guess you like

Origin www.cnblogs.com/wjlsyc/p/12063912.html