The requests python library

requests are common to the reptile in a library, it can help us a good website to request that we want crawling, and returns the contents of the site.

0x01: Request

get request, post requesting that the request is the most common way, in addition to similar delete, head, options.

Parameters request

params / data: This is passed to the server when the two incoming request parameters, params to get in, data for the post

headers: an incoming request header

proxies: Incoming agency

timeout: timeout settings

Code demonstrates:

 1 import requests
 2 
 3 url = 'http://www.tianya.cn/'
 4 params = {'username': 'zhangan', 'password': 123456}
 5 header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0' }
 6 proxies = {
 7   "http": "http://10.10.1.10:3128",
 8   "https": "http://10.10.1.10:1080",
 9 }
10 
11 # html = requests.get(url=url, params=params, headers=header, proxies=proxies, timeout=1)
12 html = requests.get(url=url, params=params, headers=header)
13 print(html.url)
14 print(html.request.headers)

输出如下:
http://www.tianya.cn/?username=zhangan&password=123456
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

 

 

The seventh line output request url, you can see when the request params get passed parameters are placed directly behind the other, we use the headers parameter when requesting a request header is set, so disguised browser, instead we use python requests. Request header eighth line, the output request

If you need to set the proxy and timeout, like 11 lines as incoming proxies and timeout parameters on the line, due to invalid proxy here, so do not run, which has a free access to proxy sites, we need, then you can go to find something like : https://www.xicidaili.com/

0x02: get

In addition to obtaining the request url and head outside, but also acquired additional content.

Web page acquiring Source: .text / .content former is a way to get the text, which is a way to get binary.

Gets the status code: .status_code

Get response header: .headers

Get cookies: .cookies

The change slightly above code:

url = 'http://www.tianya.cn/'
params = {'username': 'zhangan', 'password': 123456}
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0' }
proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080 " , 
} 

# HTML = requests.get (URL = URL, the params the params =, = headers header, Proxies Proxies =, = timeout. 1) 
HTML = requests.get (URL = URL, the params the params =, = headers header)
 Print (html.status_code)   # Get status code 
Print (html.headers)          # fetch response header 
Print (html.cookies)          # acquired cookies 

output:
200
{'Server': 'nginx', 'Date': 'Wed, 24 Jul 2019 10:57:06 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'Vary': 'Accept-Encoding', 'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Expires': 'Thu, 01 Nov 2012 10:00:00 GMT', 'ETag': 'W/"6de10a5VRB4"', 'Last-Modified': 'Fri, 19 Jul 2019 09:40:47 GMT', 'Content-Encoding': 'gzip'}
<RequestsCookieJar[]>

Sequentially outputs the status code / server returns the response headers / cookies.

In addition In addition, requests can also be maintained to keep the session, authentication, SSL certificate validation. Gangster quiet reptile great article written, you can refer to: https://cuiqingcai.com/5523.html

 

Guess you like

Origin www.cnblogs.com/liangxiyang/p/11066393.html