Request module GET and POST request use case

request is a super practical Python http client, which is often used when writing crawler and test server response data.
The requests module sends requests. There are three methods of carrying parameters: data, json, and params. Among them, data and json are used in post requests, and params are used in get requests.

parameter request
data post
json post
params get

The following content refers to the official: https://requests.readthedocs.io/zh_CN/latest/user/quickstart.html

  • Import
import requests

GET request

  • No parameters
 r = requests.get('https://api.github.com/events')
  • Passing parameters
    You may often want to pass some kind of data to the query string of the URL. If you construct the URL manually, the data will be placed in the URL in the form of key / value pairs, followed by a question mark. For example, httpbin.org/get?key=val. Requests allows you to use params keyword parameters, as a string dictionary to provide these parameters. For example, if you want to pass key1 = value1 and key2 = value2 to httpbin.org/get, then you can use the following code:
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get("http://httpbin.org/get", params=payload)

By printing out the URL, you can see that the URL has been correctly encoded:


>>> print(r.url)
http://httpbin.org/get?key2=value2&key1=value1

Note that none of the keys in the dictionary value are None will be added to the URL query string.

  • Parameters can also be passed a list
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}

>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3
  • r.text returns the result of encoding analysis in headers, you can change the decoding method by r.encoding = 'gbk'
  • r.content returns binary results
    For example, to create a picture with the binary data returned by the request, you can use the following code:
>>> from PIL import Image
>>> from io import BytesIO

>>> i = Image.open(BytesIO(r.content))
  • r.json () returns JSON format, may throw an exception
# Requests 中也有一个内置的 JSON 解码器,助你处理 JSON 数据:

>>> import requests

>>> r = requests.get('https://api.github.com/events')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

If JSON decoding fails, r.json () will throw an exception. For example, if the response is 401 (Unauthorized), trying to access r.json () will throw a ValueError: No JSON object could be decoded exception.

It should be noted that a successful call r.json () and does not mean that a successful response. Some servers will include a JSON object in the failed response (such as HTTP 500 error details). This JSON will be decoded and returned. To check whether the request was successful, use r.raise_for_status () or check whether r.status_code is the same as you expected.

  • r.status_code, return response status code
>>> r = requests.get('http://httpbin.org/get')
>>> r.status_code
200
# 为方便引用,Requests还附带了一个内置的状态码查询对象:
>>> r.status_code == requests.codes.ok
True
  • r.raw returns the original socket respons, you need to add the parameter stream = True
>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

But in general, you should save the text stream to a file in the following mode:

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
        fd.write(chunk)

Using Response.iter_content will handle a lot of what you have to deal with directly using Response.raw. When streaming downloads, the above is the preferred way to get content. Note that chunk_size can be freely adjusted to a number that may better fit your use cases.

  • Custom headers
    If you want to add HTTP headers to the request, simply pass a dict to the headers parameter.
>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}

>>> r = requests.get(url, headers=headers)

Note: All header values ​​must be string, bytestring or unicode. Although passing unicode headers is also allowed, it is not recommended.

  • Timeout
    • To prevent the server from responding in a timely manner, most requests to external servers should carry the timeout parameter. By default, requests will not be automatically timed out unless the timeout value is explicitly specified. Without timeout, your code may hang for a few minutes or longer.
    • The connection timeout refers to connect()the number of seconds that Request will wait when your client connects to the remote machine port (corresponding to _). A good practice is to set the connection timeout to a value slightly larger than a multiple of 3, because the default size of the TCP packet retransmission window is 3.
    • Once your client connects to the server and sends an HTTP request, the read timeout refers to the time the client waits for the server to send the request. (Specifically, it refers to the time between the client waiting for the server to send bytes. In 99.9% of cases, this refers to the time before the server sends the first byte).
# 如果你制订了一个单一的值作为 timeout,如下所示:
r = requests.get('https://github.com', timeout=5)

# 这一 timeout 值将会用作 connect 和 read 二者的 timeout。如果要分别制定,就传入一个元组:
r = requests.get('https://github.com', timeout=(3.05, 27))

# 如果远端服务器很慢,你可以让 Request 永远等待,传入一个 None 作为 timeout 值,然后就冲咖啡去吧。
r = requests.get('https://github.com', timeout=None)
  • Cookies
    If a response contains some cookies, you can quickly access them
>>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

To send your cookies to the server, you can use the cookies parameter:

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

The return object of Cookie is RequestsCookieJar. Its behavior is similar to a dictionary, but the interface is more complete and suitable for use across domain names and paths. You can also pass Cookie Jar to Requests:

>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
>>> url = 'http://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'

POST request

  • Passing the form
    Usually, you send some data encoded in the form of a form-very much like an HTML form. To achieve this, simply pass a dictionary to the data parameter. Your data dictionary is automatically encoded into a form when you make a request.
>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

You can also pass in a list of tuples for the data parameter. This method is especially effective when multiple elements in the form use the same key:

>>> payload = (('key1', 'value1'), ('key1', 'value2'))
>>> r = requests.post('http://httpbin.org/post', data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key1": [
      "value1",
      "value2"
    ]
  },
  ...
}
  • Pass JSON

Many times the data you want to send is not encoded in form. If you pass a string instead of a dict, the data will be published directly.

For example, Github API v3 accepts POST / PATCH data encoded as JSON:

>>> import json

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

# 方式一
>>> r = requests.post(url, data=json.dumps(payload))

# 方式二
## 此处除了可以自行对 dict 进行编码,你还可以使用 json 参数直接传递,然后它就会被自动编码。这是 2.4.2 版的新加功能:
>>> r = requests.post(url, json=payload)
  • Pass file
>>> url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

Configuration file name, file type and request header:

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

If you want, you can also send the string received as a file:

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "some,data,to,send\\nanother,row,to,send\\n"
  },
  ...
}

Advanced usage

  • Redirection and request history
    By default, except for HEAD, Requests will automatically handle all redirects.
>>> r = requests.get('http://github.com')

>>> r.url
'https://github.com/'

>>> r.status_code
200

>>> r.history
[<Response [301]>]

If you are using GET, OPTIONS, POST, PUT, PATCH, or DELETE, then you can disable redirection processing with the allow_redirects parameter:

>>> r = requests.get('http://github.com', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]

If you use HEAD, you can also enable redirection:

>>> r = requests.head('http://github.com', allow_redirects=True)
>>> r.url
'https://github.com/'
>>> r.history
[<Response [301]>]
  • session

  • Timeout
    You can tell requests to stop waiting for a response after the number of seconds set by the timeout parameter. Basically all production code should use this parameter. If not used, your program may lose its response forever:


>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

Note that
timeout is only valid for the connection process and has nothing to do with the download of the response body. timeout is not the time limit of the entire download response, but if the server does not reply within timeout seconds, an exception will be thrown (more precisely, no bytes are received from the underlying socket within timeout seconds Data) If no timeout is specified explicitly, requests do not time out.

  • Errors and exceptions
    • When encountering network problems (such as: DNS query failure, connection refused, etc.), Requests will throw a ConnectionError exception.
    • If the HTTP request returns an unsuccessful status code, Response.raise_for_status () will throw an HTTPError exception.
    • If the request times out, a Timeout exception is thrown.
    • If the request exceeds the maximum number of redirects, a TooManyRedirects exception will be thrown.
    • All exceptions explicitly thrown by Requests are inherited from requests.exceptions.RequestException.
Published 141 original articles · Like 318 · Visit 270,000+

Guess you like

Origin blog.csdn.net/Sunny_Future/article/details/105498067