Python learning-basic usage of requests library

Get a webpage

import requests 
r = requests. get('https://www.baidu.com/') 
print(type(r)) 
print(r. status_code) 
print (type(r. text)) 
print(r. text) 
print(r.cookies)

Various requests

# 发送一个 HTTP POST 请求：
r = requests.post("http://httpbin.org/post",
data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')    
# 发送一个 HTTP delete 请求：
r = requests.head('http://httpbin.org/get')        
 # 发送一个 HTTP head 请求：
r = requests.options('http://httpbin.org/get')      
# 发送一个 HTTP options 请求：

Construct get request passing parameters

For GET requests, use the parameter params

import requests 
data={
    "key1":"value1",
    "key2":"value2"}
r = requests.get('http://httpbin.org/get', params=data)
print(r.url)
#http://httpbin.org/get?key1=value1&key2=value2

You can also pass in a list as a value

import requests 
data={
    "key1":"value1",
    "key2":["value2","value3"]}
r = requests.get('http://httpbin.org/get', params=data)
print(r.url)
#http://httpbin.org/get?key1=value1&key2=value2&key2=value3

Note: None of the keys in the dictionary will be added to the URL query string.

In addition, the return type of the web page is actually str type, but it is very special and is in JSON format. Therefore, if you want to directly parse the returned result and get a dictionary format, you can call the json () method directly. Examples are as follows:

import requests 
r = requests.get('http://httpbin.org/get')
print(r.json())
'''
{'args': {}, 
'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.23.0', 'X-Amzn-Trace-Id': 'Root=1-5e9b0b15-4d6629f8460bc48037fa4244'}, 'origin': '124.164.123.240', 'url': 'http://httpbin.org/get'}
'''

Crawl the web

Taking the Zhihu-News page as an example, a request header needs to be constructed. It can be found in the developer tools.

import requests 
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0'
}
r = requests.get("https://daily.zhihu.com/",headers=headers)
print(r.text)

Of course, we can add other field information in the headers parameter.

Grab binary data

In the above example, we grabbed a page that we knew about, but it actually returned an HTML document. What should I do if I want to capture pictures, audio, video, etc.?

The files of picture, audio and video are essentially composed of binary code. Due to the specific storage format and corresponding analysis method, we can only see these various multimedia. Therefore, if you want to grab them, you must get their binary code.

import requests 
r = requests.get("https://github.com/favicon.ico")
with open("favicon.jpg","wb") as f:
    f.write(r.content)

The open () method is used here. Its first parameter is the file name, and the second parameter represents opening in the form of binary write, and binary data can be written to the file. After running, you can find an icon named favicon.ico appears in the folder.

Build a loop statement here to continuously grab data.

POST request

import requests 
data ={'name ':'germey', 'age':'22'} 
r = requests.post("http://httpbin.org/post", data=data) 
print(r.text) 
#部分输出：
# "form": {
#   "age": "22", 
#  "name ": "germey"
#}

The form part is the submitted data, which proves that the POST request was successfully sent.

response

After sending the request, the response is naturally obtained. In the above example, we used text and content to get the content of the response. In addition, there are many attributes and methods that can be used to obtain other information, such as status codes, response headers, cookies, and so on.

import requests 
r = requests.get('http://www.baidu.com') 
print(type(r.status_code), r.status_code) 
print(type(r.headers), r.headers) 
print ( type(r.cookies), r.cookies)
print(type(r. url), r. url)
print(type(r.history), r.history)

Here we print out the status_code attribute to get the status code, the headers attribute to get the response header, the cookies attribute to get the cookies, the url attribute to get the URL, and the history attribute to get the request history.

File Upload

import requests 
files = {'file' : open ('favicon.ico','rb')}
r = requests.post('http://httpbin.org/post', files=files) 
print(r.text)

It should be noted that favicon.ico needs to be in the same directory as the current script. If there are other files, of course, you can also use other files to upload, just change the code.

This website will return a response, which contains the files field, and the form field is empty, which proves that the file upload part will have a separate file field to identify.

Cookies

First look at getting cookies

import requests 
r = requests.get('https://www.baidu.com') 
print(r.cookies) 
for key,value in r.cookies.items():
    print(key + '=' + value) 
#<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
#BDORZ=27315

Here we first call the Cookie property to successfully obtain the Cookie, we can find that it is of type RequestCookieJar. Then use the items () method to convert it into a list of tuples, traverse and output the name and value of each cookie, and implement cookie traversal analysis.

We can also directly use cookies to maintain the login status. The following uses Zhihu as an example.

import requests 

header={
    'Host':'www.zhihu.com',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0',
    'Cookie':'_zap=4f14e95a-0cea-4c5e-b2f7-45cfd43f9809; d_c0="AJCZtnMxyBCPTgURbVjB11p6-JAwsTtJB4E=|1581092643"; _xsrf=VaBV0QQwGFjz01Q9n2AmjAilhHrJXypa; tst=h; q_c1=516c72a5ff954c66b6563ff42e63387d|1585814979000|1582104705000; tshl=; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1587179989,1587216876,1587216886,1587267050; capsion_ticket="2|1:0|10:1587267052|14:capsion_ticket|44:YTUwMDY3MGYyNmJlNDU0ZTgxNjlhNjMwNWNkYzAxNmQ=|7b8a5ebd3649fb076617379e734a79cd7ef54d1242ddd1841aba6844c9d14727"; l_cap_id="YjJiNjc1MzY0ZmEzNGNlYjlkYThkODEyYmEzOWRiOTk=|1587222516|5a01a93ea68209c1116647750ed2131efa309a3d"; r_cap_id="N2EwMjY0N2NlNTM1NGZlMjliNGNhMGJmOTkyMDc1OTE=|1587222516|238b677c781f1ef90a7ad343d6cdd3871aff3269"; cap_id="OTVhNjZiMDQ3MDkzNGVjY2I5ZTUyNTlhOTcxNzk3Njg=|1587222516|6dd1ed77526aa949bccd4146ef218d8164804a6e"; KLBRSID=031b5396d5ab406499e2ac6fe1bb1a43|1587267062|1587267049; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1587267062; SESSIONID=wopWDVALc4X3RJObFrIWNChoNDJpogYSdBPicuRm7vV; JOID=WlgXBkLsoG-SjPrGduF5tDN1xettk80YycmkhT2OnDWm0rGBFgxg_8GF8MN9HDmwsdmzwZheWKVLuonghNnDleo=; osd=V1gXB0vhoG-ThffGduBwuTN1xOJgk80ZwMSkhTyHkTWm07iMFgxh9syF8MJ0ETmwsNC-wZhfUahLuojpidnDlOM=; z_c0="2|1:0|10:1587267060|4:z_c0|92:Mi4xT2JORUJnQUFBQUFBa0ptMmN6SElFQ1lBQUFCZ0FsVk45Qk9KWHdEa0NUcXVheUJDdnJtRzRUVEFHNjFqQThvd013|bb30373e1f13c8b751a3ffc09e8ab4c98780350f77989d93b20be7eb3a0b2fad"'

}
r = requests.get('https://www.zhihu.com/hot',headers=header) 
print(r.text)

The result includes the result after login. Of course, you can also set it through the cookies parameter, but then you need to construct the RequestsCookieJar object, and you need to split the cookies. This is relatively cumbersome

Session maintenance

In requests, if you directly use the methods such as get () or post (), you can indeed simulate the request of the web page, but this is actually equivalent to different sessions, which means that you have opened it with two browsers. Different pages.

In fact, the main way to solve this problem is to maintain the same session, which is equivalent to opening a new browser tab instead of opening a new browser. But I don't want to set cookies every time, what should I do? At this time, there is a new weapon-Session object.

import requests 
s = requests.Session() 
s.get('http://httpbin.org/cookies/set/number/123456789') 
r = s.get('http://httpbin.org/cookies')
print(r.text) 
#{
#  "cookies": {
#    "number": "123456789"  }}

With Session, you can simulate the same session without worrying about cookies. It is usually used to simulate the next operation after a successful login.

SLL certificate verification

In addition, requests also provide the function of certificate verification. When sending an HTTP request, it will check the SSL certificate, we can use the verify parameter to control whether to check this certificate. In fact, if the verify parameter is not added, the default is True, and it will be automatically verified.

For example, the 12306 website is not trusted by the official CA.

import requests 
response = requests.get('https://www.12306.cn', verify=False) 
print(response.status_code)

Proxy settings

In order to prevent the verification code from popping up after multiple visits, or jumping to the login authentication page, we need to set up a proxy to solve this problem, which requires the use of proxies parameters. It can be set in this way:

import requests 
proxies = { 'http':'http:10 .10.1.10:3128',
 'http':'http: //10.10.1.10: 1080', }
requests.get('https://www.taobao.com', proxies=proxies) 
#代理无效，请换用自己的代理

requests also supports SOCKS proxy.

Timeout setting

When the local network condition is not good or the server network response is too slow or even no response, we may wait for a long time before we can receive a response, or even report an error when we finally receive no response. In order to prevent the server from responding in time, a timeout period should be set, that is, if there is no response after this time, an error will be reported. This requires the timeout parameter. The calculation of this time is the time to send the request to the server and return the response. Examples are as follows:

import requests 
r=requests.get('https://www.taobao.com', timeout=1)
print(r.status_code)

If you want to wait forever, you can directly set timeout to None, or leave it blank without setting, because the default is None.

Authentication

requests provides a simple way to write a tuple, it will use the HTTPBasicAuth class for authentication by default.

import requests 
r = requests.get('https://localhost:5000',
auth=(' username',' password'))

requests also provides other authentication methods, such as OAuth authentication.