Infi-chu:
http://www.cnblogs.com/Infi-chu/
1. Basic usage:
1. Install:
pip install requests
2. Example:
import requests url = 'http://www.baidu.com' r = requests.get(url) print(type(r)) # type is str (JSON format) print(r.status_code) print(r.text) print(r.cookies)
[Note] The rest of the request methods are the same
r = requests.post(url) r = requests.put(url) r = requests.delete(url) r = requests.head(url) r = requests.options(url)
3. GET request:
- example
import requests url = 'http://www.baidu.com' r = requests.get(url) print(r.text)
There are two ways to add parameters to the url:
a. Add directly
r = requests.get(url+parameters)
b. Add via params parameter
import requests data = {"name":"Infi-chu","age":"23"} r = requests.get(url,params=data)
The return type of the web page is str type, which is in JSON format, we can directly call the json() method
If the returned result is not in JSON format, a parsing error will occur and a json.decode.JSONDecodeError exception will be thrown
- Crawl the web
Regular expressions and headers can be used.
- Grab binary data
Pictures, audio, video and other files are essentially composed of binary codes.
Grab the GitHub icon:
import requests r = requests.get("http://github.com/favicon.ico") print(r.text) print(r.content) # save Picture with open('favicon.ico','wb') as f: f.write(r.content)
- add headers
When crawling Zhihu, you must add information to the User-Agent, otherwise you cannot crawl and will be intercepted
4. POST request
improt requests data = {'name':'Infi-chu','age'='23'} r = requests.post('http://www.baidu.com',data=data)
After success, you will see the submitted data in the form (F12 to view)
5. Response
After sending the data, what we get is the response, we use text and content to get the content, the following is additional information:
import requests r = requests.get('http://www.baidu.com') print(type(r.status_code),r.status_code) print(type(r.headers),r.headers) print(type(r.cookies),r.cookies) print(type(r.history),r.history) print(type(r.url),r.url)
The headers property returns the CaseInsensitiveDict type
The cookies property returns RequestsCookieJar type
2. Advanced usage:
1. File upload
import requests f = {'file':open('favicon.ico','rb')} r = requests.post(url,files=f) print(r.text)
2.Cookies
import requests r = requests.get(url) print(r.cookies) for k,v in r.cookies.items(): print(k+"="+v)
3. Session maintenance
Using the Session object
import requests s = requests.Session() s.get('http://httpbin.org/cookies/set/number/123456789') r = s.get('http://httpbin.org/cookies') print(r.text)
4. SSL certificate verification
Requests provides the function of certificate verification. Use the verify parameter to control whether to check this certificate. The default is True, which will automatically verify
5. Proxy settings
For some websites, the information can be obtained normally by requesting several times during the test, but once the large-scale crawling occurs, the verification code or the IP may be directly blocked, resulting in inaccessibility for a period of time.
Proxy settings:
import requests proxy = {'http':'http://ip:port','https':'https://ip:port'} requests.get('https://www.taobao.com',proxies=proxy)
6. Timeout setting
import requests r = requests.get('https://www.taobao.com',timeout=1) print(r.status_code)
7. Authentication
import requests from requests.auth import HTTPBasicAuth r = requests.get(url,auth=HTTPBasicAuth('username','password')) print(r.status_code) # can be abbreviated as r = requests.get(url,auth=('username','password')) print(r.status_code) # OAuth authentication is also provided, use pip3 install requests_oauthlib
8.Prepared Request
Represent the request as a data structure called Prepared Request