requests are common to the reptile in a library, it can help us a good website to request that we want crawling, and returns the contents of the site.
0x01: Request
get request, post requesting that the request is the most common way, in addition to similar delete, head, options.
Parameters request
params / data: This is passed to the server when the two incoming request parameters, params to get in, data for the post
headers: an incoming request header
proxies: Incoming agency
timeout: timeout settings
Code demonstrates:
1 import requests 2 3 url = 'http://www.tianya.cn/' 4 params = {'username': 'zhangan', 'password': 123456} 5 header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0' } 6 proxies = { 7 "http": "http://10.10.1.10:3128", 8 "https": "http://10.10.1.10:1080", 9 } 10 11 # html = requests.get(url=url, params=params, headers=header, proxies=proxies, timeout=1) 12 html = requests.get(url=url, params=params, headers=header) 13 print(html.url) 14 print(html.request.headers)
输出如下:
http://www.tianya.cn/?username=zhangan&password=123456
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
The seventh line output request url, you can see when the request params get passed parameters are placed directly behind the other, we use the headers parameter when requesting a request header is set, so disguised browser, instead we use python requests. Request header eighth line, the output request
If you need to set the proxy and timeout, like 11 lines as incoming proxies and timeout parameters on the line, due to invalid proxy here, so do not run, which has a free access to proxy sites, we need, then you can go to find something like : https://www.xicidaili.com/
0x02: get
In addition to obtaining the request url and head outside, but also acquired additional content.
Web page acquiring Source: .text / .content former is a way to get the text, which is a way to get binary.
Gets the status code: .status_code
Get response header: .headers
Get cookies: .cookies
The change slightly above code:
url = 'http://www.tianya.cn/' params = {'username': 'zhangan', 'password': 123456} header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0' } proxies = { "http": "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080 " , } # HTML = requests.get (URL = URL, the params the params =, = headers header, Proxies Proxies =, = timeout. 1) HTML = requests.get (URL = URL, the params the params =, = headers header) Print (html.status_code) # Get status code Print (html.headers) # fetch response header Print (html.cookies) # acquired cookies
output:
200
{'Server': 'nginx', 'Date': 'Wed, 24 Jul 2019 10:57:06 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'Vary': 'Accept-Encoding', 'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Expires': 'Thu, 01 Nov 2012 10:00:00 GMT', 'ETag': 'W/"6de10a5VRB4"', 'Last-Modified': 'Fri, 19 Jul 2019 09:40:47 GMT', 'Content-Encoding': 'gzip'}
<RequestsCookieJar[]>
Sequentially outputs the status code / server returns the response headers / cookies.
In addition In addition, requests can also be maintained to keep the session, authentication, SSL certificate validation. Gangster quiet reptile great article written, you can refer to: https://cuiqingcai.com/5523.html