Python3 crawler (4) the use of the request library requests

Infi-chu:

http://www.cnblogs.com/Infi-chu/

1. Basic usage:

1. Install:

pip install requests

2. Example:

import requests
url = 'http://www.baidu.com'
r = requests.get(url)
print(type(r)) # type is str (JSON format)
print(r.status_code)
print(r.text)
print(r.cookies)

[Note] The rest of the request methods are the same

r = requests.post(url)
r = requests.put(url)
r = requests.delete(url)
r = requests.head(url)
r = requests.options(url)

3. GET request:

example

import requests
url = 'http://www.baidu.com'
r = requests.get(url)
print(r.text)

　There are two ways to add parameters to the url:

　　a. Add directly

r = requests.get(url+parameters)

　　b. Add via params parameter

import requests
data = {"name":"Infi-chu","age":"23"}
r = requests.get(url,params=data)

The return type of the web page is str type, which is in JSON format, we can directly call the json() method

If the returned result is not in JSON format, a parsing error will occur and a json.decode.JSONDecodeError exception will be thrown

Crawl the web

Regular expressions and headers can be used.

Grab binary data

Pictures, audio, video and other files are essentially composed of binary codes.

Grab the GitHub icon:

import requests
r = requests.get("http://github.com/favicon.ico")
print(r.text)
print(r.content)
# save Picture
with open('favicon.ico','wb') as f:
    f.write(r.content)

add headers

When crawling Zhihu, you must add information to the User-Agent, otherwise you cannot crawl and will be intercepted

4. POST request

improt requests
data = {'name':'Infi-chu','age'='23'}
r = requests.post('http://www.baidu.com',data=data)

　After success, you will see the submitted data in the form (F12 to view)

5. Response

After sending the data, what we get is the response, we use text and content to get the content, the following is additional information:

import requests
r = requests.get('http://www.baidu.com')
print(type(r.status_code),r.status_code)
print(type(r.headers),r.headers)
print(type(r.cookies),r.cookies)
print(type(r.history),r.history)
print(type(r.url),r.url)

　The headers property returns the CaseInsensitiveDict type

　The cookies property returns RequestsCookieJar type

2. Advanced usage:

1. File upload

import requests
f = {'file':open('favicon.ico','rb')}
r = requests.post(url,files=f)
print(r.text)

2.Cookies

import requests
r = requests.get(url)
print(r.cookies)
for k,v in r.cookies.items():
    print(k+"="+v)

3. Session maintenance

Using the Session object

import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)

4. SSL certificate verification

Requests provides the function of certificate verification. Use the verify parameter to control whether to check this certificate. The default is True, which will automatically verify

5. Proxy settings

For some websites, the information can be obtained normally by requesting several times during the test, but once the large-scale crawling occurs, the verification code or the IP may be directly blocked, resulting in inaccessibility for a period of time.

Proxy settings:

import requests
proxy = {'http':'http://ip:port','https':'https://ip:port'}
requests.get('https://www.taobao.com',proxies=proxy)

6. Timeout setting

import requests
r = requests.get('https://www.taobao.com',timeout=1)
print(r.status_code)

7. Authentication

import requests
from requests.auth import HTTPBasicAuth
r = requests.get(url,auth=HTTPBasicAuth('username','password'))
print(r.status_code)

# can be abbreviated as
r = requests.get(url,auth=('username','password'))
print(r.status_code)
# OAuth authentication is also provided, use pip3 install requests_oauthlib

8.Prepared Request

Represent the request as a data structure called Prepared Request

Python3 crawler (4) the use of the request library requests

Guess you like