Python 深入浅出 - 网络请求库 Requests

Requests 是用 Python 语言编写的，基于 urllib，采用 Apache2 Licensed 开元协议的 HTTP 库，它比 urllib 更加方便，编写爬虫和测试服务器响应数据时经常会用到。

安装 Requests

通过 pip 安装

>>> pip install requests

下载源码安装

git clone git://github.com/kennethreitz/requests.git
cd requests
python setup.py install

Requests 对象

Requests 对象函数	作用
requests.get()	GET 请求
requests.post()	POST 请求
requests.put()	PUT 请求
requests.delete()	DELETE 请求
requests.head()	HEAD 请求
requests.options()	OPTIONS 请求

Response 对象

Response 对象变量或函数	函数意义
response.url	requests 请求的 URL
response.status_code	响应状态码
response.encoding	响应的编码格式
response.text	获取响应的文本
response.raw	返回原始相应体，使用 response.raw.read()读取
response.content	字节方式的响应体
response.headers	以 dict 对象存储响应头，这个 dict 比较特殊，key 不区分大小写，若 key 不存在，则返回 None
response.json()	将响应内容直接转换成 JSON 格式
response.raise_for_status()	请求失败抛出异常（status_code 非 200）
response.reason	对象响应码的解释，例如 200 时，response.reason = “OK”

GET 请求

import requests
import json

params_dict = {'question':'Python Requests'}    # 请求参数
response = requests.get('http://gank.io/api/data/Android/1/1', params=params_dict)
status_code = response.status_code              # 状态码
url = response.url                              # 请求 URL
encoding = response.encoding                    # 检测编码
headers_dict = response.headers                 # 响应头 dict

print("url = ", url)
print("status_code = ", status_code)
print("encoding = ", encoding)
print("headers:")
for key,value in headers_dict.items():
    print(key," = ",value)

输出结果：

url =  http://gank.io/api/data/Android/1/1?question=Python+Requests
status_code =  200
encoding =  None
headers:
Server  =  Tengine
Content-Type  =  application/json
Content-Length  =  426
Connection  =  keep-alive
Date  =  Sat, 30 Dec 2017 08:29:16 GMT
Via  =  cache13.l2nu20-2[191,200-0,M], cache37.l2nu20-2[192,0], cache2.cn370[258,200-0,M], cache8.cn370[259,0]
X-Cache  =  MISS TCP_MISS dirn:-2:-2 mlen:-1
X-Swift-SaveTime  =  Sat, 30 Dec 2017 08:29:16 GMT
X-Swift-CacheTime  =  0
Timing-Allow-Origin  =  *
EagleId  =  3b6c8ad015146225559598476e

文本响应内容

response = requests.get("http://gank.io/api/data/Android/1/1")
text = response.text                            # 文本响应内容
print("text = ",text)

输出结果：

text =  {
  "error": false, 
  "results": [
    {
      "_id": "5a3a4654421aa90fe72536cc", 
      "createdAt": "2017-12-20T19:15:32.928Z", 
      "desc": "Git \u4f7f\u7528\u4e4b\u91cd\u5199\u5386\u53f2\u8bb0\u5f55", 
      "publishedAt": "2017-12-27T12:13:22.418Z", 
      "source": "web", 
      "type": "Android", 
      "url": "http://www.jianshu.com/p/8f46e13a8ada", 
      "used": true, 
      "who": "ZhangTitanjum"
    }
  ]
}

二进制响应内容

response = requests.get("http://gank.io/api/data/Android/1/1")
content = response.content                      # 二进制响应内容
print("content = ",content)

输出结果：

content =  b'{\n  "error": false, \n  "results": [\n    {\n      "_id": "5a3a4654421aa90fe72536cc", \n      "createdAt": "2017-12-20T19:15:32.928Z", \n      "desc": "Git \\u4f7f\\u7528\\u4e4b\\u91cd\\u5199\\u5386\\u53f2\\u8bb0\\u5f55", \n      "publishedAt": "2017-12-27T12:13:22.418Z", \n      "source": "web", \n      "type": "Android", \n      "url": "http://www.jianshu.com/p/8f46e13a8ada", \n      "used": true, \n      "who": "ZhangTitanjum"\n    }\n  ]\n}\n'

JSON 响应内容

response = requests.get("http://gank.io/api/data/Android/1/1")
content = response.content                      # 二进制响应内容
print("content = ",content)

输出结果：

json =  {'error': False, 'results': [{'_id': '5a3a4654421aa90fe72536cc', 'createdAt': '2017-12-20T19:15:32.928Z', 'desc': 'Git 使用之重写历史记录', 'publishedAt': '2017-12-27T12:13:22.418Z', 'source': 'web', 'type': 'Android', 'url': 'http://www.jianshu.com/p/8f46e13a8ada', 'used': True, 'who': 'ZhangTitanjum'}]}

原始响应内容

response = requests.get("http://gank.io/api/data/Android/1/1",stream=True)
raw = response.raw                              # 原始响应内容
print("raw = ",raw)
print("raw read byte = ",raw.read(10))

输出结果：

raw =  <urllib3.response.HTTPResponse object at 0x024EF4B0>
raw read byte =  b'{\n  "error'

想获取服务端的原始相应内容，需要在请求中设置 stream = True。

添加 HTTP headers

github_url = 'https://developer.github.com/v3/some/endpoint'
headers = {"user-agent":"my-app-v0.1"}
resp = requests.get(github_url,headers=headers)
print(resp.text)

POST 表单请求

POST 请求发送表单请求，传递一个 dict 给 data 参数即可。

user_url = 'http://httpbin.org/post'
user_info = {'name':'mike','age':21}
resp = requests.post(user_url,data=user_info)
text  = resp.text
print(text)

输出结果：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "21", 
    "name": "mike"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "16", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "43.248.244.132", 
  "url": "http://httpbin.org/post"
}

POST JSON 字符串

print("---post json----")
user_url = 'http://httpbin.org/post'
user_info = {'name':'mike','age':21}
json_str = json.dumps(user_info)
# resp = requests.post(user_url,data=json_str)   # 这种方式也是可以的
resp = requests.post(user_url,json=user_info)
text  = resp.text

print(text)

输出结果：


---post json----
{
  "args": {}, 
  "data": "{\"name\": \"mike\", \"age\": 21}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "27", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": {
    "age": 21, 
    "name": "mike"
  }, 
  "origin": "58.246.141.153", 
  "url": "http://httpbin.org/post"
}

POST File 上传文件

在 requests 中发送文件的接口只有一种，那就是 requests.post file 参数，请求形式如下：

url ='http://httpbin.org/post'
data = None
files = {}
resp = requests.post(url,data,files=files)

其中 files 参数是可以接收多种形式的数据，最基本的 2 种形式为：

字典类型 dict ( 官方推荐使用字典参数格式 )
元祖列表类型 tuple

（1）字典类型 dict 的 files 参数

print('-----post 上传文件-------')
url ='http://httpbin.org/post'
data = None
files_dict = {'field':('filename',open('D:/aa.jpg','rb'),'image/jpeg',{'refer':'www.baidu.com'})}
resp = requests.post(url,data,files=files_dict)
print('text = ',resp.text)

其中，这个 files_dict 的 key 就是发送 post 请求时的字段名（即 field 字段），而字典的 value 则描述了要发送的文件的信息。

文件信息包括（“filename”,”fileobject”,”Content-Type”,”headers”）

输出结果：

-----post 上传文件-------
text =  {
  "args": {}, 
  "data": "", 
  "files": {
    "field": "data:image/jpeg;base64,/9j/4AAQSf8A8kLr/wCN0f8ADQnwx/6Gf/yQuv8A43RRQB4B+1f8QvDHjv8A4Rb/AIRTU/t/2L7V5/8Ao8sWzf5O376rnOxumelFFFAH/9k="
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "163038", 
    "Content-Type": "multipart/form-data; boundary=8758341a8d8844869074054764661556", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "223.167.118.81", 
  "url": "http://httpbin.org/post"
}

（2）元祖类型 tuple 的 files 参数

print('-----post tuple 上传文件-------')
url ='http://httpbin.org/post'
data = None
files_tuple = ({'field':('filename',open('D:/aa.jpg','rb'),'image/jpeg',{'refer':'www.baidu.com'})})
resp = requests.post(url,data,files=files_tuple)
print('text = ',resp.text)

上传文件同时传递 data 参数

data = {"k1" : "v1"}  
files = {  
  "field1" : open("1.png", "rb")  
}  
r = requests.post("http://httpbin.org/post", data, files=files)

cookies = response.cookies   # cookies
print(type(cookies))

输出结果：

<class 'requests.cookies.RequestsCookieJar'>

response.cookies 返回的对象是 RequestsCookieJar ,它的行为和字典 dict 类似。

res = requests.get('http://www.baidu.com')
cookies = res.cookies
print(type(cookies))
print('keys  = ',cookies.keys())
print('values  = ',cookies.values())
print('cookies["BDORZ"] = ',cookies['BDORZ'])

输出结果：

<class 'requests.cookies.RequestsCookieJar'>
keys  =  ['BDORZ']
values  =  ['27315']
cookies["BDORZ"] =  27315

使用 cookies，发送 cookies 至服务器：

print('------cookies-----')
cookie_url = 'http://httpbin.org/cookies'
cookies_dict = dict(cookies_key = 'Python')
r = requests.get(cookie_url,cookies = cookies_dict)
text = r.text
print(text)

输出结果：

------cookies-----
{
  "cookies": {
    "cookies_key": "Python"
  }
}

超时

timeout 参数可以设定在一定时间之后停止等待响应，时间单位：秒。

print('------timeout------')
response_git  = requests.get('http://github.com',timeout=0.1)

输出结果：

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x024E4330>, 'Connection to github.com timed out. (connect timeout=0.1)'))

注意： timeout 仅对连接过程有效，与响应体的下载无关，timeout 并不是整个下载响应的时间限制，而是如果服务器在 timeout 秒内没有应答，将会引发一个异常（更准确的说，是在 timeout 秒内没有从基础套接字上接收到任何字节的数据时）

Requests 异常

print('------timeout------')
def timeout_request():
    try:
        response_git = requests.get('http://github.com',timeout=0.1)
        response_git.raise_for_status()
    except exceptions.Timeout as e:
        print('timeout')
    except exceptions.HTTPError as e:
        print('httperror')
    else:
        print("status_code  = ",response_git.status_code)
        print("text = ",response_git.text)


timeout_request()

常见 Requests 异常：

ConnectionError：由于网络原因，无法建立连接。
HTTPError：响应状态码不为 200，Response.raise_for_status() 会抛出 HTTPError 异常。
Timeout ：连接超时。
TooManyRedirects：若请求超过了设定的最大重定向次数，则会抛出 TooManyRedirects 异常。