Requests用法

本文链接： https://blog.csdn.net/weixin_43927138/article/details/90713764

安装Requests模块：pip install requests

首先要导入 Requests 模块：

import requests

尝试获取某个网页：

a = requests.get(‘https://www.baidu.com’)

这样就有了一个名为a的Response对象。我们可以从这个对象中获取所有我们想要的信息。

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如，你可以这样发送一个 HTTP POST 请求：

r = requests.post(‘http://httpbin.org/post’, data = {‘key’:‘value’})

其他 HTTP请求类型：PUT，DELETE，HEAD 以及 OPTIONS 都是一样的：

r = requests.put(‘http://httpbin.org/put’, data = {‘key’:‘value’})
r = requests.delete(‘http://httpbin.org/delete’)
r = requests.head(‘http://httpbin.org/get’)
r = requests.options(‘http://httpbin.org/get’)

传递 URL 参数：
想为URL的查询字符串(query string)传递某种数据。首先，可以直接在URL中，跟在一个问号的后面。例如：httpbin.org/get?key=val；其次，Requests种允许使用 params 关键字参数，以一个字符串字典来提供这些参数。如果你想传递 key1=value1 和 key2=value2 到 httpbin.org/get ，那么你可以使用如下代码：

payload = {‘key1’: ‘value1’, ‘key2’: ‘value2’}
r = requests.get(“http://httpbin.org/get”, params=payload)

通过打印输出该 URL，你能看到 URL 已被正确编码：

print(r.url)
打印结果：http://httpbin.org/get?key2=value2&key1=value1

你还可以将一个列表作为值传入：

payload = {‘key1’: ‘value1’, ‘key2’: [‘value2’, ‘value3’]}
r = requests.get(‘http://httpbin.org/get’, params=payload)
print(r.url)
打印结果：http://httpbin.org/get?key1=value1&key2=value2&key2=value3

响应内容
Response对象的text属性可以获取服务器响应内容的文本形式，Requests会自动解码：

r = requests.get(‘https://api.github.com/events’)
a=r.text
print a
打印结果：[{“id”:“9167113775”,“type”:“PushEvent”,“actor”…

访问Response.text时，Requests将基于HTTP头猜测响应内容编码。使用Response.encoding属性可以查看或改变Requests使用的编码：

r.encoding=‘utf-8’
r.encoding = ‘ISO-8859-1’

二进制响应内容

r = requests.get(‘https://api.github.com/events’)
a=r.content
print a
打印结果：b’[{“id”:“9167113775”,“type”:“PushEvent”,“actor”…

JSON响应内容
Response对象的json()方法可以获取服务器响应内容的JSON形式：

r = requests.get(‘https://api.github.com/events’)
a=r.json()
print a
[{u’payload’: {u’forkee’: {u’issues_url’: …

如果JSON解码失败，将抛出异常。

原始响应内容
在极少情况下，可能需要访问服务器原始套接字响应。通过在请求中设置stream=True参数，并访问Response对象的raw属性实现：

r = requests.get(‘https://api.github.com/events’, stream=True)
a=r.raw
print a
打印结果：<urllib3.response.HTTPResponse object at 0x101194810>
b=r.raw.read(10)
print b
打印结果：’\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03’

定制请求头
传递一个dict对象到headers参数，可以添加HTTP请求头：

url = ‘https://api.github.com/some/endpoint’
headers = {‘user-agent’:‘my-app/0.0.1’}
r = requests.get(url, headers=headers)

定制的header的优先级较低，在某些场景或条件下可能被覆盖。
所有header的值必须是string, bytestring或unicode类型。但建议尽量避免传递unicode类型的值
更复杂的POST请求
发送form-encoded数据
给data参数传递一个字典对象：

payload = {‘key1’: ‘value1’, ‘key2’: ‘value2’}
r = requests.post(“https://httpbin.org/post”, data=payload)

如果有多个值对应一个键，可以使用由元组组成的列表或者值是列表的字典：

payload_tuples = [(‘key1’, ‘value1’), (‘key1’, ‘value2’)]
r1 = requests.post(‘https://httpbin.org/post’, data=payload_tuples)
payload_dict = {‘key1’: [‘value1’, ‘value2’]}
r2 = requests.post(‘https://httpbin.org/post’, data=payload_dict)

发送非form-encoded数据
如果传递的是字符串而非字典，将直接发送该数据：

url = ‘https://api.github.com/some/endpoint’
payload = {‘some’: ‘data’}
r = requests.post(url, json=payload)

如果在请求中使用了data或files参数，json参数会被忽略。b) 在请求中使用json参数会改变Content-Type的值为application/json
POST一个多部分编码(Multipart-Encoded)的文件
上传文件：

url = ‘https://httpbin.org/post’
files = {‘file’: open(‘report.xls’, ‘rb’)}
r = requests.post(url, files=files)

显式地设置文件名，内容类型(Content-Type)以及请求头：

url = ‘https://httpbin.org/post’
files = {‘file’: (‘report.xls’, open(‘report.xls’, ‘rb’), ‘application/vnd.ms-excel’, {‘Expires’: ‘0’})}
r = requests.post(url, files=files)

甚至可以发送作为文件接收的字符串：

url = ‘http://httpbin.org/post’
files = {‘file’: (‘report.csv’, ‘some,data,to,send\nanother,row,to,send\n’)}
r = requests.post(url, files=files)

响应状态码
Response对象的status_code属性可以获取响应状态：

r = requests.get(‘https://httpbin.org/get’)
a=r.status_code
print a
打印结果：200

requests库还内置了状态码以供参考：

r = requests.get(‘https://httpbin.org/get’)
print r.status_code == requests.codes.ok
打印结果：True

如果请求异常(状态码为4XX的客户端错误或5XX的服务端错误)，可以调用raise_for_status()方法抛出异常：

bad_r = requests.get(‘https://httpbin.org/status/404’)
a=bad_r.status_code
b=bad_r.raise_for_status()
print a
print b

响应头

Response对象的headers属性可以获取响应头，它是一个字典对象，键不区分大小写：

r = requests.get(‘https://api.github.com/events’)
r.headers
print r.headers
print r.headers[‘Content-Type’]
print r.headers.get(‘content-type’)
打印结果：{‘X-XSS-Protection’: ‘1; mode=block’, ‘Content-Security-Policy’: …
application/json; charset=utf-8
application/json; charset=utf-8

Cookies
Response对象的cookies属性可以获取响应中的cookie信息：

url = ‘http://example.com/some/cookie/setting/url’
r = requests.get(url)
r.cookies[‘example_cookie_name’]
‘example_cookie_value’

使用cookies参数可以发送cookie信息：

url = ‘https://httpbin.org/cookies’
cookies = dict(cookies_are=‘working’)
r = requests.get(url, cookies=cookies)

Response.cookies返回的是一个RequestsCookieJar对象，跟字典类似但提供了额外的接口，适合多域名或多路径下使用，也可以在请求中传递：
jar = requests.cookies.RequestsCookieJar()

jar.set(‘tasty_cookie’, ‘yum’, domain=‘httpbin.org’, path=’/cookies’)
jar.set(‘gross_cookie’, ‘blech’, domain=‘httpbin.org’, path=’/elsewhere’)
url = ‘https://httpbin.org/cookies’
r = requests.get(url, cookies=jar)
r.text
‘{“cookies”: {“tasty_cookie”: “yum”}}’

重定向及请求历史
requests默认对除HEAD外的所有请求执行地址重定向。Response.history属性可以追踪重定向历史，它返回一个list，包含为了完成请求创建的所有Response对象并由老到新排序。

下面是一个HTTP重定向HTTPS的用例：

r = requests.get(‘http://github.com/’)
r.url
‘https://github.com/’
r.status_code
200
r.history
[<Response [301]>]

使用allow_redirects参数可以禁用重定向：

r = requests.get(‘http://github.com/’, allow_redirects=False)
r.status_code
301
r.history
[]

如果使用的是HEAD请求，也可以使用allow_redirects参数允许重定向：

r = requests.head(‘http://github.com/’, allow_redirects=True)
r.url
‘https://github.com/’
r.history
[<Response [301]>]

请求超时
使用timeout参数设置服务器返回响应的最大等待时间：

requests.get(‘https://github.com/’, timeout=0.001)
Traceback (mostrecent call last):
File “”, line 1, in
requests.exceptions.Timeout: HTTPConnectionPool(host=‘github.com’,port=80): Request timed out. (timeout=0.001)

会话对象
会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie，期间使用 urllib3 的 connection pooling 功能。所以如果你向同一主机发送多个请求，底层的 TCP 连接将会被重用，从而带来显著的性能提升。 (参见 HTTP persistent connection).
会话对象具有主要的 Requests API 的所有方法。
我们来跨请求保持一些 cookie:

s = requests.Session()
s.get(‘http://httpbin.org/cookies/set/sessioncookie/123456789’)
r =s.get(“http://httpbin.org/cookies”)
print(r.text)
打印结果：{“cookies”:{“sessioncookie”: “123456789”}}

会话也可用来为请求方法提供缺省数据。这是通过为会话对象的属性提供数据来实现的：

s = requests.Session() s.auth = (‘user’, ‘pass’)
s.headers.update({‘x-test’: ‘true’})
#both ‘x-test’ and ‘x-test2’ are sent
s.get(‘http://httpbin.org/headers’, headers={‘x-test2’: ‘true’})

任何你传递给请求方法的字典都会与已设置会话层数据合并。方法层的参数覆盖会话的参数。

不过需要注意，就算使用了会话，方法级别的参数也不会被跨请求保持。下面的例子只会和第一个请求发送 cookie ，而非第二个：

s = requests.Session()
r = s.get(‘http://httpbin.org/cookies’, cookies={‘from-my’:‘browser’})
print(r.text)
#’{“cookies”: {“from-my”: “browser”}}’
r = s.get(‘http://httpbin.org/cookies’)
print(r.text)
#’{“cookies”: {}}’

如果你要手动为会话添加 cookie，就使用 Cookie utility 函数来操纵 Session.cookies。
会话还可以用作前后文管理器：

with requests.Session() as s:
s.get(‘http://httpbin.org/cookies/set/sessioncookie/123456789’)

这样就能确保 with 区块退出后会话能被关闭，即使发生了异常也一样。

高级用法：
请求与响应对象
任何时候进行了类似 requests.get() 的调用，你都在做两件主要的事情。其一，你在构建一个 Request 对象，该对象将被发送到某个服务器请求或查询一些资源。其二，一旦 requests 得到一个从服务器返回的响应就会产生一个 Response 对象。该响应对象包含服务器返回的所有信息，也包含你原来创建的 Request 对象。如下是一个简单的请求，从 Wikipedia 的服务器得到一些非常重要的信息：

r = requests.get(‘http://en.wikipedia.org/wiki/Monty_Python’)

如果想访问服务器返回给我们的响应头部信息，可以这样做：

r.headers
{‘content-length’: ‘56170’, ‘x-content-type-options’:‘nosniff’, ‘x-cache’:…

然而，如果想得到发送到服务器的请求的头部，我们可以简单地访问该请求，然后是该请求的头部：

r.request.headers
{‘Accept-Encoding’: ‘identity, deflate, compress, gzip’, ‘Accept’:’/’, ‘User-Agent’: ‘python-requests/0.13.1’}

准备的请求
当你从 API 或者会话调用中收到一个 Response 对象时，request 属性其实是使用了 PreparedRequest。有时在发送请求之前，你需要对 body 或者 header （或者别的什么东西）做一些额外处理，下面演示了一个简单的做法：

from requests import Request, Session s = Session() req =
Request(‘GET’, url,
data=data,
headers=header
)
prepped = req.prepare()
# do something with prepped.body
# do something with prepped.headers
resp = s.send(prepped,
stream=stream,
verify=verify,
proxies=proxies,
cert=cert,
timeout=timeout
)
print(resp.status_code)

由于你没有对 Request 对象做什么特殊事情，你立即准备和修改了 PreparedRequest 对象，然后把它和别的参数一起发送到 requests.* 或者 Session.*。
然而，上述代码会失去 Requests Session 对象的一些优势，尤其 Session 级别的状态，例如 cookie 就不会被应用到你的请求上去。要获取一个带有状态的 PreparedRequest，请用 Session.prepare_request() 取代 Request.prepare() 的调用，如下所示：

from requests import Request, Session
s = Session() req = Request(‘GET’, url,
data=data
headers=headers )
prepped = s.prepare_request(req)
#do something with prepped.body
#do something with prepped.headers

resp = s.send(prepped,
stream=stream,
verify=verify,
proxies=proxies,
cert=cert,
timeout=timeout )
print(resp.status_code)

错误及异常
ConnectionError：网络异常，比如DNS错误，连接拒绝等。
HTTPError：如果请求返回4XX或5XX状态码，调用Response.raise_for_status()会抛出此异常。
Timeout：连接超时。
TooManyRedirects：请求超过配置的最大重定向数。
RequestException：异常基类。

更多 Requests 的高级用法请参考：http://cn.pythonrequests.org/zh_CN/latest/user/advanced.html#advanced

猜你喜欢