python reptile basis of 03-requests library

Elegant to the bone Requests

This article addresses: https://www.jianshu.com/p/678489e022c8

Brief introduction

Previous article describes the Python library network requests urlliband urllib3to use, then, is the same as the library network request Requests, with respect urllib, what are the advantages of it?

In fact, only two words, simple and elegant.

RequestsThe Declaration is: HTTP for Humans . It can be said, Requeststhoroughly implement the simple and elegant spirit represented by Python.

Before urllibas Python's standard library, because of historical reasons, the method used can be said to be very cumbersome and complex, and official documents is also very simple, often need to view the source code. In contrast, Requeststhe use of very simple, intuitive, user-friendly, allowing programmers to focus completely liberated from the use of the library.

Urllib.request even in official documents, this sentence to recommend Requests:

The Requests packageis recommended for a higher-level HTTP client interface.

RequestsThe official document also very detailed perfect, but there are rare Chinese official document: http://cn.python-requests.org/zh_CN/latest/ .

English documentation: http://docs.python-requests.org/en/master/api/

Of course, in order to ensure the accuracy of, or try to view the document in English as well.

 

Author

RequestsAuthor Kenneth Reitz is also a legendary figure.

Kenneth Reitz in a "cloud services originator," said the company Heroku, at the age of 28 has served as the chief architect of the Python language. What did he do it? Column name just a few items: requests, python-guide, pipenv , legit, autoenv, of course, it also gives a lot of Python community well-known open source projects contributed code, such Flask.

Python can say that he is a pivotal figure in the field, his obsessive pursuit of a code-like beauty.

Gangster legend more than that, this is a picture of him in that year PyCON speech:

Very cute little fat, but also in line with the public for some programmer stereotype: fat, less trim pieces, shy.

But a few years later, he has become such:

 

 

emmmmm, handsome guy, what are you going the whole tolerance?

Haha, a joke. But really change the appearance of the area is very large, by the image of a fat little house becomes cool chic.

So do not give yourself a lazy attitude to life, looking for any excuse. You can get even better!

If they wish to pursue, we can become what we want.

Examples of characteristics

It can be said Requestsstraightforward biggest feature is its elegant style. Whether the request method, or the processing results in response, and cookies, url parameters, POST data submission, reflects this style.

The following is a simple example:

>>> import requests
>>> resp = requests.get('https://www.baidu.com')

>>> resp.status_code
200

>>> resp.headers['content-type']
'application/json; charset=utf8'

>>> resp.encoding
'utf-8'

>>> resp.text
u'{"type":"User"...'

可以看到,不论是请求的发起还是相应的处理,都是非常直观明了的。

 

Requests目前基本上完全满足web请求的所有需求,以下是Requests的特性:

  • Keep-Alive & 连接池

  • 国际化域名和 URL

  • 带持久 Cookie 的会话

  • 浏览器式的 SSL 认证

  • 自动内容解码

  • 基本/摘要式的身份认证

  • 优雅的 key/value Cookie

  • 自动解压

  • Unicode 响应体

  • HTTP(S) 代理支持

  • 文件分块上传

  • 流下载

  • 连接超时

  • 分块请求

  • 支持 .netrc

Requests 3.0目前也募集到了资金正在开发中,预计会支持async/await来实现并发请求,且可能会支持HTTP 2.0。

 

安装

Requests的安装非常的简单,直接PIP安装即可:

pip install requests

 

使用

Requests的请求不再像urllib一样需要去构造各种Request、opener和handler,使用Requests构造的方法,并在其中传入需要的参数即可。

 

发起请求

请求方法

每一个请求方法都有一个对应的API,比如GET请求就可以使用get()方法:

>>> import requests
>>> resp = requests.get('https://www.baidu.com')

而POST请求就可以使用post()方法,并且将需要提交的数据传递给data参数即可:

>>> import requests
>>> resp = requests.post('http://httpbin.org/post', data = {'key':'value'})

而其他的请求类型,都有各自对应的方法:

>>> resp = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> resp = requests.delete('http://httpbin.org/delete')
>>> resp = requests.head('http://httpbin.org/get')
>>> resp = requests.options('http://httpbin.org/get')

非常的简单直观明了。


传递URL参数

传递URL参数也不用再像urllib中那样需要去拼接URL,而是简单的,构造一个字典,并在请求时将其传递给params参数:

>>> import requests
>>> params = {'key1': 'value1', 'key2': 'value2'}
>>> resp = requests.get("http://httpbin.org/get", params=params)

此时,查看请求的URL,则可以看到URL已经构造正确了:

>>> print(resp.url)
http://httpbin.org/get?key2=value2&key1=value1

并且,有时候我们会遇到相同的url参数名,但有不同的值,而python的字典又不支持键的重名,那么我们可以把键的值用列表表示:

>>> params = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> resp = requests.get('http://httpbin.org/get', params=params)
>>> print(resp.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3

自定义Headers

如果想自定义请求的Headers,同样的将字典数据传递给headers参数。

>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> resp = requests.get(url, headers=headers)

自定义Cookies

Requests中自定义Cookies也不用再去构造CookieJar对象,直接将字典递给cookies参数。

>>> url = 'http://httpbin.org/cookies'
>>> cookies = {'cookies_are': 'working'}

>>> resp = requests.get(url, cookies=cookies)
>>> resp.text
'{"cookies": {"cookies_are": "working"}}'

设置代理

当我们需要使用代理时,同样构造代理字典,传递给proxies参数。

import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

requests.get('http://example.org', proxies=proxies)

重定向

在网络请求中,我们常常会遇到状态码是3开头的重定向问题,在Requests中是默认开启允许重定向的,即遇到重定向时,会自动继续访问。

>>> resp = requests.get('http://github.com', allow_redirects=False)
>>> resp.status_code
301

禁止证书验证

有时候我们使用了抓包工具,这个时候由于抓包工具提供的证书并不是由受信任的数字证书颁发机构颁发的,所以证书的验证会失败,所以我们就需要关闭证书验证。

在请求的时候把verify参数设置为False就可以关闭证书验证了。

>>> import requests
>>> resp = requests.get('http://httpbin.org/post', verify=False)

但是关闭验证后,会有一个比较烦人的warning

py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)

可以使用以下方法关闭警告:

from requests.packages.urllib3.exceptions import InsecureRequestWarning
# 禁用安全请求警告
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

设置超时

设置访问超时,设置timeout参数即可。

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

可见,通过Requests发起请求,只需要构造好几个需要的字典,并将其传入请求的方法中,即可完成基本的网络请求。

 

响应

通过Requests发起请求获取到的,是一个requests.models.Response对象。通过这个对象我们可以很方便的获取响应的内容。

响应内容

之前通过urllib获取的响应,读取的内容都是bytes的二进制格式,需要我们自己去将结果decode()一次转换成字符串数据。

Requests通过text属性,就可以获得字符串格式的响应内容。

>>> import requests

>>> resp = requests.get('https://api.github.com/events')
>>> resp.text
u'[{"repository":{"open_issues":0,"url":"https://github.com/...

Requests会自动的根据响应的报头来猜测网页的编码是什么,然后根据猜测的编码来解码网页内容,基本上大部分的网页都能够正确的被解码。而如果发现text解码不正确的时候,就需要我们自己手动的去指定解码的编码格式。

>>> import requests

>>> resp = requests.get('https://api.github.com/events')
>>> resp.encoding = 'utf-8'
>>> resp.text
u'[{"repository":{"open_issues":0,"url":"https://github.com/...

而如果你需要获得原始的二进制数据,那么使用content属性即可。

>>> resp.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...

如果我们访问之后获得的数据是JSON格式的,那么我们可以使用json()方法,直接获取转换成字典格式的数据。

>>> import requests

>>> resp = requests.get('https://api.github.com/events')
>>> resp.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

状态码

通过status_code属性获取响应的状态码

>>> resp = requests.get('http://httpbin.org/get')
>>> resp.status_code
200

响应报头

通过headers属性获取响应的报头

>>> r.headers
{
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'connection': 'close',
    'server': 'nginx/1.0.4',
    'x-runtime': '148ms',
    'etag': '"e1ca502697e5c9317743dc078f67693f"',
    'content-type': 'application/json'
}

服务器返回的cookies

通过cookies属性获取服务器返回的cookies

>>> url = 'http://example.com/some/cookie/setting/url'
>>> resp = requests.get(url)
>>> resp.cookies['example_cookie_name']
'example_cookie_value'

url

还可以使用url属性查看访问的url。

>>> import requests
>>> params = {'key1': 'value1', 'key2': 'value2'}
>>> resp = requests.get("http://httpbin.org/get", params=params)
>>> print(resp.url)
http://httpbin.org/get?key2=value2&key1=value1

 

Session

Requests中,实现了Session(会话)功能,当我们使用Session时,能够像浏览器一样,在没有关闭关闭浏览器时,能够保持住访问的状态。

这个功能常常被我们用于登陆之后的数据获取,使我们不用再一次又一次的传递cookies。

import requests

session = requests.Session()

session.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
resp = session.get('http://httpbin.org/cookies')

print(resp.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

首先我们需要去生成一个Session对象,然后用这个Session对象来发起访问,发起访问的方法与正常的请求是一摸一样的。

同时,需要注意的是,如果是我们在get()方法中传入headerscookies等数据,那么这些数据只在当前这一次请求中有效。如果你想要让一个headersSession的整个生命周期内都有效的话,需要用以下的方式来进行设置:

# 设置整个headers
session.headers = {
    'user-agent': 'my-app/0.0.1'
}
# 增加一条headers
session.headers.update({'x-test': 'true'})

 

 

后记:或许有人不认可代码的美学,认为代码写的丑没事,能跑起来就好。但是我始终认为,世间万物都应该是美好的,追求美好的脚步也不应该停止。

Guess you like

Origin www.cnblogs.com/winfun/p/10984093.html