【BOOK】requests库

A, Request library

1, GET requests

  Crawl page (add headers, modify headers, to prevent website blocking)

#抓取网页,知乎
import requests
import re

## 浏览器标识
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}
r = requests.get('https://www.zhihu.com/explore', headers=headers)
pattern = re.compile('explore-feed.*?question_link.*?>(.*?)</a>', re.S)
titles = re.findall(pattern, r.text)
print(titles)  

  Grab binary data (images, audio, video ...)

## grab binary data acquired github icon in the current directory 
Import Requests 
R & lt requests.get = ( 'https://github.com/favicon.ico') 
with Open ( 'the favicon.ico', 'WB ') AS F: 
    f.write (r.content)

 

 

2, POST request (message submission form)

 

3, in response to (commit request, returns a response)

 

4, file upload

# 文件上传
import requests

files = {'file':open('favicon.ico','rb')}
r = requests.post('http://httpbin.org/post', files=files)
print(r.text)

  

5, get, set, save logged Cookies []

## Get Cookies 
Import Requests 

R & lt requests.get = ( 'https://baidu.com') 
Print (r.cookies) ## RequestsCookieJar type 
#Cookie traversal resolved: item () method to convert into a tuple cookies, through each cookie a name and value 
for key, value in r.cookies.items () : ## items () loading the cookies Huawei list of tuples, each traversing a cookie name and value of the 
    print (key + '=' + value)

 

## Cookies remain logged 
## get cookie directly on the page has been logged, assigned to the headers 
Import Requests
 
headers = {
    'cookie':'_zap=bf241714-d6f9-4e5f-9608-fa7b85f32db6; _xsrf=79ff86e9-5e76-4fa6-a384-4f528af88eb9; d_c0="AHBWI_Li4RCPTpXmzvEr1EkNgFDaBMtY-nA=|1582816893"; __guid=74140564.2608088362801457700.1582816893817.983; _ga=GA1.2.1547480939.1582816896; _gid=GA1.2.1931834251.1582816896; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1582816896; capsion_ticket="2|1:0|10:1582818182|14:capsion_ticket|44:MDhiMmFkNmY0YjI1NGRkYzgxMGZkY2Q3Mzk3YWYxZjU=|5bfdca13743bf8cb5de50f1c152f7d51120a4bf811eb2bfafdfc1079d69ffa9d"; z_c0="2|1:0|10:1582818209|4:z_c0|92:Mi4xSU00SERnQUFBQUFBY0ZZajh1TGhFQ2NBQUFDRUFsVk5vSEJfWGdDU2JjQkRxS3JNdElMNmZ3UjIzUVZ1WThyWWFn|61d7ba8d2dca14b10c7004277e43687cc4ef25116720ae3649d656dcc8cfef26"; monitor_count=3; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1582818210;KLBRSID = e42bab774ac0012482937540873c03cf | 1582818280 | 1582816893 ',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
r = requests.get('https://www.zhihu.com/people/kuluma-59', headers=headers)
print(r.text)

  

6, the session objects to maintain -Session

 

Each use get () or post () method to submit a request, equivalent to different sessions, equivalent to the use of two different browser opens the page.

Use the Session object can maintain a conversation, do not worry cookie problem

Login for simulation

Requests Import 
# submitted to the first request, provided Cookie 
requests.get ( 'http://httpbin.org/cookies/set/number/123456789') 
# submit a second request is not provided Cookie 
R & lt requests.get = ( ' http://httpbin.org/cookies') 
# acquires cookie most recent request 
Print (r.text) # "Cookies": {} 


Import requests 
## the session () objects will remain the same session 
s = requests.Session ( ) 
s.get ( 'http://httpbin.org/cookies/set/number/123456789') 
R & lt s.get = ( 'http://httpbin.org/cookies') 
Print (r.text) # "Cookies ": {" number ":" 123456789 "}

  

. 7, the SSL certificate validation parameters --vertify

If there was a request for certificate validation error page SSLError representation, the website's certificate is not trusted agency official CA

Vertify need to modify the parameters to False, True default, which would request was successful

Requests Import 
Import requests.packages 
Import urllib3 

urllib3.disable_warnings () ## when running the program, ignoring the warning 
R & lt requests.get = ( 'https://www.12306.cn', vertify = False) 
Print (r.status_code)

  

8, proxy settings -proxies parameters

Large-scale and frequent site request may pop up a verification code or jump to the login page, or IP ban

※ HTTP proxy

import requests

proxies = {
    'http':'http://user:password@host:port'
}
requests.get('https://www.taobao.com', proxies=proxies)

 

※ SOCKS proxy protocol

Library install socks: pip install 'requests [socks]'

 

import requests

proxies = {
    'http':'socks5://user:password@host:port',
    'https':'socks5://user:password@host:port'
}
requests.get('https://www.taobao.com', proxies=proxies)

  

9, the timeout parameter set -timeout

Exceeds the set time has not yet responded to throw an exception

 

10, authentication

Requests Import 

r = requests.get ( 'HTTP: // localhost: 5000', auth = ( 'username', 'password')) 
Print (r.status_code) 
## 200 return the correct username and password, or 401

  

11、Prepared Request

       The request indicates the data structure

 

Guess you like

Origin www.cnblogs.com/motoharu/p/12442867.html