Web crawler-----Introduction to the Requests library

Web crawler-----Introduction to the Requests library

1. Requests.get() uses
requests.get(url)
url:
The attributes of the response object of the URL link to be obtained are as follows:
Insert picture description here
r.encoding: If the charset does not exist in the header, the encoding is considered to be ISO-8859-1
r. apparent_encoding: The
approximate flow of the encoding method analyzed according to the web content is as follows:
first use r.status_code
to return 200, then use r.text r.encoding r.apparent_encoding r.content
to return 404 or other situations, it is due to some reasons. Will produce an exception

2. There are six exceptions in the Requests library, as follows:
Insert picture description here

3. HTTP protocol and Requests library method

HTTP protocol
URL format http://host[:port][path]
host: legal Internet host domain name or IP address
port: port number, the default port is 80
path: request resource path

The main seven methods of the Requests
Insert picture description here
library①requests.request(method,url,**kwargs)
method: request method
requests.request('GET',url,**kwargs)
requests.request('HEAD',url,**kwargs) )
requests.request('POST',url,**kwargs)
requests.request('PUT',url,**kwargs)
requests.request('PATCH,url,**kwargs)
requests.request('DELETE', url,**kwargs)
requests.request('OPTIONS',url,**kwargs)
url: the url link of the page to be obtained
**kwargs: parameters that control access, all are optional
params: dictionary or byte sequence, as Parameters are added to the url
data: dictionary, byte sequence or file object, as the content of the request
json: JSON format data, as the content of the request
headers: dictionary, HTTP custom header
cookies: dictionary or CookieJar, cookie
auth in the request : yuan Group, support HTTP authentication function
files: dictionary type, file transfer
timeout: set the timeout time, in seconds
proxies: dictionary type, set access proxy server, you can add login authentication
allow_redirects: True/False, default is True, redirect switch
stream: True/False, default is True, get content download immediately switch
verify: True/False, default If True, certify the SSL certificate switch
cert: local SSL certificate path
②requests.get(url,params=None,**kwargs)
url: the url link of the page to be obtained
params: additional parameters in the url, dictionary or byte stream format, Optional
**kwargs: 12 control access parameters (except for params in ①, all are the same)
③requests.head(url,**kwargs)
url: URL link of the page to be obtained
**kwargs: 13 control access parameters
④requests. post(url,data=None,json=None,**kwargs)
url: URL link of the page to be obtained
data: dictionary, byte sequence or file, request content
json: JSON format data, request content
**kwargs :11 parameters to control access
⑤requests.put(url,data=None,**kwargs)
url: URL link of the page to be obtained data: dictionary, byte sequence or file, content of Request
**kwargs: 12 control access parameter
⑥requests.patch(url,data=None,**kwargs)
url: the url link
data of the page to be obtained : dictionary, byte sequence or file, the content of the request
**kwargs: 12 control access parameters
⑦requests.delete(url, **kwargs)
url: URL link of the page to be deleted
**kwargs: 13 parameters to control access

Content reference: https://www.icourse163.org/learn/BIT-1001870001?tid=1461055451#/learn/announce

Guess you like

Origin blog.csdn.net/qq_44921056/article/details/109008003