Introduction
Requests is written in python based on urllib, using the HTTP library of the Apache2 Licensed open source protocol. Requests is more convenient than urllib and can save us a lot of work.
1. Install
pip Quickly install pip install requests
Second, use
1, first a string of code
import requests response = requests.get("https://www.baidu.com") print(type(response)) print(response.status_code) print(type(response.text)) # get response header content print(response.headers) print(response.headers['content-type']) #You can also get the request header content in this way print(response.request.headers) response.enconding = "utf-8' print(response.text) print(response.cookies) print(response.content) print(response.content.decode("utf-8"))
response.text returns Unicode format, which usually needs to be converted to utf-8 format, otherwise it will be garbled. response.content is binary mode, you can download videos and the like, if you want to watch it, you need to decode it into utf-8 format.
Whether it is through response.content.decode("utf-8) or through response.encoding="utf-8", the problem of garbled characters can be avoided
2. A big push request method
import requests requests.post("http://httpbin.org/post") requests.put("http://httpbin.org/put") requests.delete("http://httpbin.org/delete") requests.head("http://httpbin.org/get") requests.options("http://httpbin.org/get")
Basic GET:
import requests url = 'https://www.baidu.com/' response = requests.get(url) print(response.text)
GET request with parameters:
If you want to query the specific parameters of the http://httpbin.org/get page, you need to add it to the url. For example, if I want to see if there is any data of Host=httpbin.org, the url form should be http ://httpbin.org/get?Host=httpbin.org
The data submitted below is the data in the data sent to this address.
import requests url = 'http://httpbin.org/get' data = { 'name': 'zhangsan', 'age':'25' } response = requests.get(url,params=data) print(response.url) print(response.text)
Json data:
From the data below we can get if the result:
1. The response.json() method in requests is equivalent to the json.loads (response.text) method
import requests import json response = requests.get("http://httpbin.org/get") print(type(response.text)) print(response.json()) print(json.loads(response.text)) print(type(response.json()) print(response.json()['data'])
get binary data
The response.content is mentioned above, the data obtained in this way is binary data, and the same method can also be used to download pictures and video resources
add header
First of all, why add header (header information)? For example, below, we try to access Zhihu's login page (of course, if you don't log in to Zhihu, you won't be able to see the content inside), what will be wrong if we try not to add header information.
import requests url = 'https://www.zhihu.com/' response = requests.get(url) response.encoding = "utf-8" print(response.text)
result:
It prompts that an internal server error has occurred (that is, you can't even download the html of the Zhihu login page).
<html><body><h1>500 Server Error</h1> An internal server error occured. </body></html>
If you want to access, you must add headers information.
import requests url = 'https://www.zhihu.com/' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36' } response = requests.get(url,headers=headers) print(response.text)
Basic post request:
submit data to the url address through post, which is equivalent to submitting the data in the form in the form of a dictionary
import requests url = 'http://httpbin.org/post' data = { 'name':'jack', 'age':'23' } response = requests.post(url,data=data) print(response.text)
result:
{ "args": {}, "data": "", "files": {}, "form": { "age": "23", "name": "jack" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "16", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.13.0" }, "json": null, "origin": "118.144.137.95", "url": "http://httpbin.org/post" }
response:
import requests #allow_redirects=False#Setting this property to False means that redirection is not allowed, otherwise it can be redirected response = requests.get("http://www.baidu.com",allow_redirects=False) #Print the status of the request page (status code) print(type(response.status_code),response.status_code) #Print all the information of the headers of the request URL print(type(response.headers),response.headers) #Print the cookies information of the request URL print(type(response.cookies),response.cookies) #Print the address of the request URL print(type(response.url),response.url) #Print request history (displayed as a list) print(type(response.history),response.history)
Built-in status codes:
100: ('continue',), 101: ('switching_protocols',), 102: ('processing',), 103: ('checkpoint',), 122: ('uri_too_long', 'request_uri_too_long'), 200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', '✓'), 201: ('created',), 202: ('accepted',), 203: ('non_authoritative_info', 'non_authoritative_information'), 204: ('no_content',), 205: ('reset_content', 'reset'), 206: ('partial_content', 'partial'), 207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'), 208: ('already_reported',), 226: ('im_used',), # Redirection. 300: ('multiple_choices',), 301: ('moved_permanently', 'moved', '\\o-'), 302: ('found',), 303: ('see_other', 'other'), 304: ('not_modified',), 305: ('use_proxy',), 306: ('switch_proxy',), 307: ('temporary_redirect', 'temporary_moved', 'temporary'), 308: ('permanent_redirect', 'resume_incomplete', 'resume',), # These 2 to be removed in 3.0 # Client Error. 400: ('bad_request', 'bad'), 401: ('unauthorized',), 402: ('payment_required', 'payment'), 403: ('forbidden',), 404: ('not_found', '-o-'), 405: ('method_not_allowed', 'not_allowed'), 406: ('not_acceptable',), 407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'), 408: ('request_timeout', 'timeout'), 409: ('conflict',), 410: ('gone',), 411: ('length_required',), 412: ('precondition_failed', 'precondition'), 413: ('request_entity_too_large',), 414: ('request_uri_too_large',), 415: ('unsupported_media_type', 'unsupported_media', 'media_type'), 416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'), 417: ('expectation_failed',), 418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'), 421: ('misdirected_request',), 422: ('unprocessable_entity', 'unprocessable'), 423: ('locked',), 424: ('failed_dependency', 'dependency'), 425: ('unordered_collection', 'unordered'), 426: ('upgrade_required', 'upgrade'), 428: ('precondition_required', 'precondition'), 429: ('too_many_requests', 'too_many'), 431: ('header_fields_too_large', 'fields_too_large'), 444: ('no_response', 'none'), 449: ('retry_with', 'retry'), 450: ('blocked_by_windows_parental_controls', 'parental_controls'), 451: ('unavailable_for_legal_reasons', 'legal_reasons'), 499: ('client_closed_request',), # Server Error. 500: ('internal_server_error', 'server_error', '/o\\', '✗'), 501: ('not_implemented',), 502: ('bad_gateway',), 503: ('service_unavailable', 'unavailable'), 504: ('gateway_timeout',), 505: ('http_version_not_supported', 'http_version'), 506: ('variant_also_negotiates',), 507: ('insufficient_storage',), 509: ('bandwidth_limit_exceeded', 'bandwidth'), 510: ('not_extended',), 511: ('network_authentication_required', 'network_auth', 'network_authentication')
import requests response = requests.get('http://www.jianshu.com/404.html') # Use the letters built into the request to determine the status code #If the status code returned by the response is abnormal, return a 404 error if response.status_code != requests.codes.ok: print('404') #If the status code returned by the page is 200, print the following status response = requests.get('http://www.jianshu.com') if response.status_code == 200: print('200')
Advanced operations for requests
File Upload
import requests url = "http://httpbin.org/post" files= {"files":open("test.jpg","rb")} response = requests.post(url,files=files) print(response.text)
result:
get cookies
import requests response = requests.get('https://www.baidu.com') print(response.cookies) for key,value in response.cookies.items(): print(key,'==',value)
session maintenance
One of the functions of cookies is that they can be used to simulate login and maintain session
import requests session = requests.session() session.get('http://httpbin.org/cookies/set/number/12456') response = session.get('http://httpbin.org/cookies') print(response.text)
Difference between cookie and session
The cookie data is stored on the client's browser, and the session data is stored on the server. The
cookie is not very safe. Others can analyze the cookie stored locally and use the cookie to deceive the
session. The session will be saved on the server for a certain period of time. When the number of visits increases, it will take up the performance of your server.
The data saved by a single cookie cannot exceed 4K. Many browsers limit a site to save a maximum of 20 cookies.
Certificate verification
1. Access without a certificate
import requests response = requests.get('https://www.12306.cn') # When requesting https, the request will verify the certificate, and if the verification fails, an exception will be thrown print(response.status_code)
Error:
Turn off certificate verification
import requests # Turn off validation, but still give a certificate warning response = requests.get('https://www.12306.cn',verify=False) print(response.status_code)
In order to avoid this situation, you can pass verify=False, but this way you can access the page results
Eliminate alerts for validating certificates
from requests.packages import urllib3 import requests urllib3.disable_warnings() response = requests.get('https://www.12306.cn',verify=False) print(response.status_code)
Other operations
#Convert the cookie object to a dictionary requests.utils.dict_from_cookiejar #Convert dictionary to cookie object requests.utils.cookiejar_from_dict #url decoding requests.utils.unquote() #url encoding requests.utils.quote()
Manually set up certificates
import requests response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key')) print(response.status_code)
proxy settings
Fundamentals of Proxy
Forward proxy and reverse proxy
Forward proxy: The browser clearly knows what server to access, but it cannot be reached at present. It needs to use a proxy to help complete this request operation.
Reverse proxy: The browser does not know anything about the server to request and needs to request through Nginx.
Set up a normal proxy
import requests proxies = { "http": "http://127.0.0.1:9743", "https": "https://127.0.0.1:9743", } response = requests.get("https://www.taobao.com", proxies=proxies) print(response.status_code)
2. Set username and password proxy
set socks proxy
Install socks module pip3 install 'requests[socks]'
import requests proxies = { 'http': 'socks5://127.0.0.1:9742', 'https': 'socks5://127.0.0.1:9742' } response = requests.get("https://www.taobao.com", proxies=proxies) print(response.status_code)
timeout setting
The timeout period can be set by the timeout parameter
import requests from requests.exceptions import ReadTimeout try: # Set the response must be received within 500ms, otherwise a ReadTimeout exception will be thrown response = requests.get("http://httpbin.org/get", timeout=0.5) print(response.status_code) except ReadTimeout: print('Timeout')
Authentication settings
If you encounter a website that requires authentication, you can use the requests.auth module to achieve
import requests from requests.auth import HTTPBasicAuth <br>#Method 1 r = requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123'))<br> #方法二<br>r = requests.get('http://120.27.34.24:9001', auth=('user', '123')) print(r.status_code)
exception handling
Exceptions about reqeusts can be found here: http://www.python-requests.org/en/master/api/#exceptions
All exceptions are in requests.exceptons
From the source code, we can see that
RequestException inherits IOError,
HTTPError, ConnectionError, Timeout inherits RequestionException, ProxyError, SSLError inherits ConnectionError, and
ReadTimeout inherits Timeout exception
Here are some commonly used exception inheritance relationships. For details, see: http://cn.python-requests.org/zh_CN/latest/_modules/requests/exceptions.html#RequestException
Simple demonstration with the following example
import requests from requests.exceptions import ReadTimeout, ConnectionError, RequestException try: response = requests.get("http://httpbin.org/get", timeout = 0.5) print(response.status_code) except ReadTimeout: print('Timeout') except ConnectionError: print('Connection error') except RequestException: print('Error')
The first exception to be caught is timeout. When the haul disconnects the network, ConnectionError will be caught. If the previous exceptions are not caught, they can also be caught by RequestExctption.
Additional:
server content
# 1. Read the content of the server response
r.text
# Requests text encoding, you can use the r.encoding property to change it
r.encoding
# 2. Binary response content, access the request response body in bytes, for non-text requests
r.content
# 3. JSON response content, built-in JSON decoder to help you process JSON data
r.json()
# Get the raw socket response from the server
r.raw
Thanks to everyone who read my article carefully, watching the rise and attention of fans all the way, there is always a need for a gift exchange, although it is not a very valuable thing, if you can use it, you can take it directly:
① More than 2000 Python e-books (mainstream and classic books should be available)
② Python standard library information (the most complete Chinese version)
③ Project source code (forty or fifty interesting and classic training projects and source code)
④ Videos on basic introduction to Python, crawler, web development, and big data analysis (suitable for novice learning)
⑤ Python learning roadmap (say goodbye to inexperienced learning)
In my QQ technical exchange group (technical exchange and resource sharing, advertisements come in to interrupt you)
You can take it away by yourself. The free information in the group number 913569736 (note "csdn000") is the essence of the author's more than ten years of testing career. There are also peer gods to exchange technology together.
The learning materials can be found by our Miss Beibei [mashan-qq] remarks [csdn000] for free
【Must note】Otherwise it will not pass