This article addresses: https://www.jianshu.com/p/678489e022c8
Brief introduction
Previous article describes the Python library network requests urllib
and urllib3
to use, then, is the same as the library network request Requests
, with respect urllib
, what are the advantages of it?
In fact, only two words, simple and elegant.
Requests
The Declaration is: HTTP for Humans . It can be said, Requests
thoroughly implement the simple and elegant spirit represented by Python.
Before urllib
as Python's standard library, because of historical reasons, the method used can be said to be very cumbersome and complex, and official documents is also very simple, often need to view the source code. In contrast, Requests
the use of very simple, intuitive, user-friendly, allowing programmers to focus completely liberated from the use of the library.
Urllib.request even in official documents, this sentence to recommend Requests
:
The Requests packageis recommended for a higher-level HTTP client interface.
Requests
The official document also very detailed perfect, but there are rare Chinese official document: http://cn.python-requests.org/zh_CN/latest/ .
English documentation: http://docs.python-requests.org/en/master/api/
Of course, in order to ensure the accuracy of, or try to view the document in English as well.
Author
Requests
Author Kenneth Reitz is also a legendary figure.
Kenneth Reitz in a "cloud services originator," said the company Heroku, at the age of 28 has served as the chief architect of the Python language. What did he do it? Column name just a few items: requests, python-guide, pipenv , legit, autoenv, of course, it also gives a lot of Python community well-known open source projects contributed code, such Flask.
Python can say that he is a pivotal figure in the field, his obsessive pursuit of a code-like beauty.
Gangster legend more than that, this is a picture of him in that year PyCON speech:
Very cute little fat, but also in line with the public for some programmer stereotype: fat, less trim pieces, shy.
But a few years later, he has become such:
emmmmm, handsome guy, what are you going the whole tolerance?
Haha, a joke. But really change the appearance of the area is very large, by the image of a fat little house becomes cool chic.
So do not give yourself a lazy attitude to life, looking for any excuse. You can get even better!
If they wish to pursue, we can become what we want.
Examples of characteristics
It can be said Requests
straightforward biggest feature is its elegant style. Whether the request method, or the processing results in response, and cookies, url parameters, POST data submission, reflects this style.
The following is a simple example:
>>> import requests
>>> resp = requests.get('https://www.baidu.com')
>>> resp.status_code
200
>>> resp.headers['content-type']
'application/json; charset=utf8'
>>> resp.encoding
'utf-8'
>>> resp.text
u'{"type":"User"...'
可以看到,不论是请求的发起还是相应的处理,都是非常直观明了的。
Requests
目前基本上完全满足web请求的所有需求,以下是Requests
的特性:
-
Keep-Alive & 连接池
-
国际化域名和 URL
-
带持久 Cookie 的会话
-
浏览器式的 SSL 认证
-
自动内容解码
-
基本/摘要式的身份认证
-
优雅的 key/value Cookie
-
自动解压
-
Unicode 响应体
-
HTTP(S) 代理支持
-
文件分块上传
-
流下载
-
连接超时
-
分块请求
-
支持
.netrc
而Requests 3.0
目前也募集到了资金正在开发中,预计会支持async/await来实现并发请求,且可能会支持HTTP 2.0。
安装
Requests
的安装非常的简单,直接PIP安装即可:
pip install requests
使用
Requests
的请求不再像urllib
一样需要去构造各种Request、opener和handler,使用Requests
构造的方法,并在其中传入需要的参数即可。
发起请求
请求方法
每一个请求方法都有一个对应的API,比如GET请求就可以使用get()
方法:
>>> import requests
>>> resp = requests.get('https://www.baidu.com')
而POST请求就可以使用post()
方法,并且将需要提交的数据传递给data参数即可:
>>> import requests
>>> resp = requests.post('http://httpbin.org/post', data = {'key':'value'})
而其他的请求类型,都有各自对应的方法:
>>> resp = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> resp = requests.delete('http://httpbin.org/delete')
>>> resp = requests.head('http://httpbin.org/get')
>>> resp = requests.options('http://httpbin.org/get')
非常的简单直观明了。
传递URL参数
传递URL参数也不用再像urllib
中那样需要去拼接URL,而是简单的,构造一个字典,并在请求时将其传递给params参数:
>>> import requests
>>> params = {'key1': 'value1', 'key2': 'value2'}
>>> resp = requests.get("http://httpbin.org/get", params=params)
此时,查看请求的URL,则可以看到URL已经构造正确了:
>>> print(resp.url)
http://httpbin.org/get?key2=value2&key1=value1
并且,有时候我们会遇到相同的url参数名,但有不同的值,而python的字典又不支持键的重名,那么我们可以把键的值用列表表示:
>>> params = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> resp = requests.get('http://httpbin.org/get', params=params)
>>> print(resp.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3
自定义Headers
如果想自定义请求的Headers,同样的将字典数据传递给headers参数。
>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> resp = requests.get(url, headers=headers)
自定义Cookies
Requests
中自定义Cookies也不用再去构造CookieJar对象,直接将字典递给cookies参数。
>>> url = 'http://httpbin.org/cookies'
>>> cookies = {'cookies_are': 'working'}
>>> resp = requests.get(url, cookies=cookies)
>>> resp.text
'{"cookies": {"cookies_are": "working"}}'
设置代理
当我们需要使用代理时,同样构造代理字典,传递给proxies
参数。
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
重定向
在网络请求中,我们常常会遇到状态码是3开头的重定向问题,在Requests
中是默认开启允许重定向的,即遇到重定向时,会自动继续访问。
>>> resp = requests.get('http://github.com', allow_redirects=False)
>>> resp.status_code
301
禁止证书验证
有时候我们使用了抓包工具,这个时候由于抓包工具提供的证书并不是由受信任的数字证书颁发机构颁发的,所以证书的验证会失败,所以我们就需要关闭证书验证。
在请求的时候把verify
参数设置为False
就可以关闭证书验证了。
>>> import requests
>>> resp = requests.get('http://httpbin.org/post', verify=False)
但是关闭验证后,会有一个比较烦人的warning
py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
可以使用以下方法关闭警告:
from requests.packages.urllib3.exceptions import InsecureRequestWarning
# 禁用安全请求警告
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
设置超时
设置访问超时,设置timeout
参数即可。
>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
可见,通过Requests
发起请求,只需要构造好几个需要的字典,并将其传入请求的方法中,即可完成基本的网络请求。
响应
通过Requests
发起请求获取到的,是一个requests.models.Response
对象。通过这个对象我们可以很方便的获取响应的内容。
响应内容
之前通过urllib
获取的响应,读取的内容都是bytes的二进制格式,需要我们自己去将结果decode()
一次转换成字符串数据。
而Requests
通过text
属性,就可以获得字符串格式的响应内容。
>>> import requests
>>> resp = requests.get('https://api.github.com/events')
>>> resp.text
u'[{"repository":{"open_issues":0,"url":"https://github.com/...
Requests
会自动的根据响应的报头来猜测网页的编码是什么,然后根据猜测的编码来解码网页内容,基本上大部分的网页都能够正确的被解码。而如果发现text
解码不正确的时候,就需要我们自己手动的去指定解码的编码格式。
>>> import requests
>>> resp = requests.get('https://api.github.com/events')
>>> resp.encoding = 'utf-8'
>>> resp.text
u'[{"repository":{"open_issues":0,"url":"https://github.com/...
而如果你需要获得原始的二进制数据,那么使用content
属性即可。
>>> resp.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...
如果我们访问之后获得的数据是JSON格式的,那么我们可以使用json()
方法,直接获取转换成字典格式的数据。
>>> import requests
>>> resp = requests.get('https://api.github.com/events')
>>> resp.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
状态码
通过status_code
属性获取响应的状态码
>>> resp = requests.get('http://httpbin.org/get')
>>> resp.status_code
200
响应报头
通过headers
属性获取响应的报头
>>> r.headers
{
'content-encoding': 'gzip',
'transfer-encoding': 'chunked',
'connection': 'close',
'server': 'nginx/1.0.4',
'x-runtime': '148ms',
'etag': '"e1ca502697e5c9317743dc078f67693f"',
'content-type': 'application/json'
}
服务器返回的cookies
通过cookies
属性获取服务器返回的cookies
>>> url = 'http://example.com/some/cookie/setting/url'
>>> resp = requests.get(url)
>>> resp.cookies['example_cookie_name']
'example_cookie_value'
url
还可以使用url
属性查看访问的url。
>>> import requests
>>> params = {'key1': 'value1', 'key2': 'value2'}
>>> resp = requests.get("http://httpbin.org/get", params=params)
>>> print(resp.url)
http://httpbin.org/get?key2=value2&key1=value1
Session
在Requests
中,实现了Session(会话)
功能,当我们使用Session
时,能够像浏览器一样,在没有关闭关闭浏览器时,能够保持住访问的状态。
这个功能常常被我们用于登陆之后的数据获取,使我们不用再一次又一次的传递cookies。
import requests
session = requests.Session()
session.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
resp = session.get('http://httpbin.org/cookies')
print(resp.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
首先我们需要去生成一个Session
对象,然后用这个Session
对象来发起访问,发起访问的方法与正常的请求是一摸一样的。
同时,需要注意的是,如果是我们在get()
方法中传入headers
和cookies
等数据,那么这些数据只在当前这一次请求中有效。如果你想要让一个headers
在Session
的整个生命周期内都有效的话,需要用以下的方式来进行设置:
# 设置整个headers
session.headers = {
'user-agent': 'my-app/0.0.1'
}
# 增加一条headers
session.headers.update({'x-test': 'true'})