The basic use of the most popular python requests in the whole network

Introduction
Requests is written in python based on urllib, using the HTTP library of the Apache2 Licensed open source protocol. Requests is more convenient than urllib and can save us a lot of work.

1. Install
pip Quickly install pip install requests

Second, use
1, first a string of code

import requests
 
response  = requests.get("https://www.baidu.com")
print(type(response))
print(response.status_code)
print(type(response.text))
# get response header content
print(response.headers)
print(response.headers['content-type'])
#You can also get the request header content in this way
print(response.request.headers)
 
response.enconding = "utf-8'
print(response.text)
 
print(response.cookies)
 
print(response.content)
print(response.content.decode("utf-8"))

response.text returns Unicode format, which usually needs to be converted to utf-8 format, otherwise it will be garbled. response.content is binary mode, you can download videos and the like, if you want to watch it, you need to decode it into utf-8 format.

Whether it is through response.content.decode("utf-8) or through response.encoding="utf-8", the problem of garbled characters can be avoided

2. A big push request method

import requests
requests.post("http://httpbin.org/post")
requests.put("http://httpbin.org/put")
requests.delete("http://httpbin.org/delete")
requests.head("http://httpbin.org/get")
requests.options("http://httpbin.org/get")

Basic GET:

import requests
 
url = 'https://www.baidu.com/'
response = requests.get(url)
print(response.text)

GET request with parameters:
If you want to query the specific parameters of the http://httpbin.org/get page, you need to add it to the url. For example, if I want to see if there is any data of Host=httpbin.org, the url form should be http ://httpbin.org/get?Host=httpbin.org

The data submitted below is the data in the data sent to this address.

import requests
 
url = 'http://httpbin.org/get'
data = {
    'name': 'zhangsan',
    'age':'25'
}
response = requests.get(url,params=data)
print(response.url)
print(response.text)

Json data:
From the data below we can get if the result:

1. The response.json() method in requests is equivalent to the json.loads (response.text) method

import requests
import json
 
response = requests.get("http://httpbin.org/get")
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(type(response.json())
print(response.json()['data'])

get binary data

The response.content is mentioned above, the data obtained in this way is binary data, and the same method can also be used to download pictures and video resources

add header

First of all, why add header (header information)? For example, below, we try to access Zhihu's login page (of course, if you don't log in to Zhihu, you won't be able to see the content inside), what will be wrong if we try not to add header information.

import requests
 
url = 'https://www.zhihu.com/'
response = requests.get(url)
response.encoding = "utf-8"
print(response.text)

result:

It prompts that an internal server error has occurred (that is, you can't even download the html of the Zhihu login page).

<html><body><h1>500 Server Error</h1>
An internal server error occured.
</body></html>

If you want to access, you must add headers information.

import requests
 
url = 'https://www.zhihu.com/'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
}
response = requests.get(url,headers=headers)
print(response.text)

Basic post request:
submit data to the url address through post, which is equivalent to submitting the data in the form in the form of a dictionary

import requests
 
url = 'http://httpbin.org/post'
data = {
    'name':'jack',
    'age':'23'
    }
response = requests.post(url,data=data)
print(response.text)

result:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "age": "23",
    "name": "jack"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "Content-Length": "16",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.13.0"
  },
  "json": null,
  "origin": "118.144.137.95",
  "url": "http://httpbin.org/post"
}

response:

import requests
 
#allow_redirects=False#Setting this property to False means that redirection is not allowed, otherwise it can be redirected
response = requests.get("http://www.baidu.com",allow_redirects=False)
#Print the status of the request page (status code)
print(type(response.status_code),response.status_code)
#Print all the information of the headers of the request URL
print(type(response.headers),response.headers)
#Print the cookies information of the request URL
print(type(response.cookies),response.cookies)
#Print the address of the request URL
print(type(response.url),response.url)
#Print request history (displayed as a list)
print(type(response.history),response.history)

Built-in status codes:

100: ('continue',),
101: ('switching_protocols',),
102: ('processing',),
103: ('checkpoint',),
122: ('uri_too_long', 'request_uri_too_long'),
200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', '✓'),
201: ('created',),
202: ('accepted',),
203: ('non_authoritative_info', 'non_authoritative_information'),
204: ('no_content',),
205: ('reset_content', 'reset'),
206: ('partial_content', 'partial'),
207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
208: ('already_reported',),
226: ('im_used',),

# Redirection.
300: ('multiple_choices',),
301: ('moved_permanently', 'moved', '\\o-'),
302: ('found',),
303: ('see_other', 'other'),
304: ('not_modified',),
305: ('use_proxy',),
306: ('switch_proxy',),
307: ('temporary_redirect', 'temporary_moved', 'temporary'),
308: ('permanent_redirect',
      'resume_incomplete', 'resume',), # These 2 to be removed in 3.0

# Client Error.
400: ('bad_request', 'bad'),
401: ('unauthorized',),
402: ('payment_required', 'payment'),
403: ('forbidden',),
404: ('not_found', '-o-'),
405: ('method_not_allowed', 'not_allowed'),
406: ('not_acceptable',),
407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
408: ('request_timeout', 'timeout'),
409: ('conflict',),
410: ('gone',),
411: ('length_required',),
412: ('precondition_failed', 'precondition'),
413: ('request_entity_too_large',),
414: ('request_uri_too_large',),
415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
417: ('expectation_failed',),
418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
421: ('misdirected_request',),
422: ('unprocessable_entity', 'unprocessable'),
423: ('locked',),
424: ('failed_dependency', 'dependency'),
425: ('unordered_collection', 'unordered'),
426: ('upgrade_required', 'upgrade'),
428: ('precondition_required', 'precondition'),
429: ('too_many_requests', 'too_many'),
431: ('header_fields_too_large', 'fields_too_large'),
444: ('no_response', 'none'),
449: ('retry_with', 'retry'),
450: ('blocked_by_windows_parental_controls', 'parental_controls'),
451: ('unavailable_for_legal_reasons', 'legal_reasons'),
499: ('client_closed_request',),

# Server Error.
500: ('internal_server_error', 'server_error', '/o\\', '✗'),
501: ('not_implemented',),
502: ('bad_gateway',),
503: ('service_unavailable', 'unavailable'),
504: ('gateway_timeout',),
505: ('http_version_not_supported', 'http_version'),
506: ('variant_also_negotiates',),
507: ('insufficient_storage',),
509: ('bandwidth_limit_exceeded', 'bandwidth'),
510: ('not_extended',),
511: ('network_authentication_required', 'network_auth', 'network_authentication')
import requests
response = requests.get('http://www.jianshu.com/404.html')
# Use the letters built into the request to determine the status code
 
#If the status code returned by the response is abnormal, return a 404 error
if response.status_code != requests.codes.ok:
    print('404')
 
#If the status code returned by the page is 200, print the following status
response = requests.get('http://www.jianshu.com')
if response.status_code == 200:
    print('200')

Advanced operations for requests

File Upload

import requests
url = "http://httpbin.org/post"
files= {"files":open("test.jpg","rb")}
response = requests.post(url,files=files)
print(response.text)

result:

 

get cookies

import requests
response = requests.get('https://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
    print(key,'==',value)

session maintenance

One of the functions of cookies is that they can be used to simulate login and maintain session

import requests
session = requests.session()
session.get('http://httpbin.org/cookies/set/number/12456')
response = session.get('http://httpbin.org/cookies')
print(response.text)

Difference between cookie and session

The cookie data is stored on the client's browser, and the session data is stored on the server. The
cookie is not very safe. Others can analyze the cookie stored locally and use the cookie to deceive the
session. The session will be saved on the server for a certain period of time. When the number of visits increases, it will take up the performance of your server.
The data saved by a single cookie cannot exceed 4K. Many browsers limit a site to save a maximum of 20 cookies.

Certificate verification

1. Access without a certificate

import requests
response = requests.get('https://www.12306.cn')
# When requesting https, the request will verify the certificate, and if the verification fails, an exception will be thrown
print(response.status_code)

Error:

 

Turn off certificate verification

import requests
# Turn off validation, but still give a certificate warning
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

In order to avoid this situation, you can pass verify=False, but this way you can access the page results

Eliminate alerts for validating certificates

from requests.packages import urllib3
import requests
 
urllib3.disable_warnings()
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

Other operations

#Convert the cookie object to a dictionary
requests.utils.dict_from_cookiejar
#Convert dictionary to cookie object
requests.utils.cookiejar_from_dict
#url decoding
requests.utils.unquote()
#url encoding
requests.utils.quote()

Manually set up certificates

import requests
 
response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key'))
print(response.status_code)

proxy settings

Fundamentals of Proxy

Forward proxy and reverse proxy

Forward proxy: The browser clearly knows what server to access, but it cannot be reached at present. It needs to use a proxy to help complete this request operation.

Reverse proxy: The browser does not know anything about the server to request and needs to request through Nginx.

Set up a normal proxy

import requests
 
proxies = {
  "http": "http://127.0.0.1:9743",
  "https": "https://127.0.0.1:9743",
}
response = requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

2. Set username and password proxy

set socks proxy

Install socks module pip3 install 'requests[socks]'

import requests
 
proxies = {
    'http': 'socks5://127.0.0.1:9742',
    'https': 'socks5://127.0.0.1:9742'
}
response = requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

timeout setting

The timeout period can be set by the timeout parameter

import requests
from requests.exceptions import ReadTimeout
 
try:
    # Set the response must be received within 500ms, otherwise a ReadTimeout exception will be thrown
    response = requests.get("http://httpbin.org/get", timeout=0.5)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')

Authentication settings

If you encounter a website that requires authentication, you can use the requests.auth module to achieve

import requests
from requests.auth import HTTPBasicAuth
<br>#Method 1
r = requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123'))<br>
#方法二<br>r = requests.get('http://120.27.34.24:9001', auth=('user', '123'))
print(r.status_code)

exception handling

Exceptions about reqeusts can be found here: http://www.python-requests.org/en/master/api/#exceptions

All exceptions are in requests.exceptons

 

From the source code, we can see that
RequestException inherits IOError,
HTTPError, ConnectionError, Timeout inherits RequestionException, ProxyError, SSLError inherits ConnectionError, and
ReadTimeout inherits Timeout exception

Here are some commonly used exception inheritance relationships. For details, see: http://cn.python-requests.org/zh_CN/latest/_modules/requests/exceptions.html#RequestException

Simple demonstration with the following example

import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException
try:
    response = requests.get("http://httpbin.org/get", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')
except ConnectionError:
    print('Connection error')
except RequestException:
    print('Error')

The first exception to be caught is timeout. When the haul disconnects the network, ConnectionError will be caught. If the previous exceptions are not caught, they can also be caught by RequestExctption.

Additional:

server content

# 1. Read the content of the server response
r.text
# Requests text encoding, you can use the r.encoding property to change it
r.encoding
# 2. Binary response content, access the request response body in bytes, for non-text requests
r.content
# 3. JSON response content, built-in JSON decoder to help you process JSON data
r.json()
# Get the raw socket response from the server
r.raw

 

Thanks to everyone who read my article carefully, watching the rise and attention of fans all the way, there is always a need for a gift exchange, although it is not a very valuable thing, if you can use it, you can take it directly:

① More than 2000 Python e-books (mainstream and classic books should be available)

② Python standard library information (the most complete Chinese version)

③ Project source code (forty or fifty interesting and classic training projects and source code)

④ Videos on basic introduction to Python, crawler, web development, and big data analysis (suitable for novice learning)


 ⑤ Python learning roadmap (say goodbye to inexperienced learning)

In my QQ technical exchange group (technical exchange and resource sharing, advertisements come in to interrupt you)

You can take it away by yourself. The free information in the group number 913569736 (note "csdn000") is the essence of the author's more than ten years of testing career. There are also peer gods to exchange technology together.

The learning materials can be found by our Miss Beibei [mashan-qq] remarks [csdn000] for free

【Must note】Otherwise it will not pass

Guess you like

Origin blog.csdn.net/csdnchengxi/article/details/122650397