Third, learn distributed reptiles on the third day

requests basic library use (third-party libraries)

Although the python standard library of urllib already contains most of the features we usually use, but its API to use feel very good, and Requests propaganda "HTTP for Humans", explained easier to use.
Requests using python language, based on urllib, but it is more convenient than urllib, we can save a lot of work to fully meet the needs of HTTP test.
Installation: pip install requests
sent get request

import requests
#添加header和查询参数
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36',
}
#params 接收一个字典或字符串的查询参数,字典类型自动转换为url编码,无需urlencode
kw = {
    'wd':'中国'
}
#发送get请求
response = requests.get('http://www.baidu.com/s',headers=headers,params=kw)
print(response) #<Response [200]>
print(response.url)
print(response.text)  #字符串形式
print(response.content) #字节流形式

Sending a post request

resp = requests.post(url,data=字典,headers=headers)

Use agents
used in the proxy database requests, as long as (get, post) in the process of transfer request proxies parameter on line

#未使用代理
# import requests
# url = 'http://www.httpbin.org/ip'
# resp = requests.get(url)
# print(resp.text)  #"origin": "111.29.161.238"

#使用代理
import requests
proxy = {
    'http':'http://182.101.207.11:8080',
}
url = 'http://www.httpbin.org/ip'
resp = requests.get(url,proxies=proxy)
print(resp.text)   #"origin": "182.101.207.11"

library handles requests cookie
if the cookie contained in a response, you can use cookies to get this property value returned cookie

import requests
#requests中获取cookie
import requests
resp = requests.get('http://www.baidu.com')
print(resp.cookies)
print(resp.cookies.get_dict())

requests shared library using the session cookie
session: Use requests can also achieve the purpose of sharing the cookie that the session object requests provided by the library.Note: session here is not web development session, where only one session object only.

#使用requests库中的session实现共享cookie
import requests
#登录页面网址
post_url = 'https://i.meishi.cc/login_t.php?redirect=https%3A%2F%2Fi.meishi.cc%2Fcook.php%3Fid%3D14417636%26session_id%3Da04cb701c24a3c81e9891fadc13e56ae'
post_data = {
    'username':'157*****414',
    'password':'*****'
}
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'

}
sess = requests.session() #实例化一个session对象
sess.post(post_url,post_data,headers=headers) #登录后sess已经拥有cookie

#个人页面网址
url = 'https://i.meishi.cc/cook.php?id=14417636'
resp = sess.get(url,headers=headers) #利用sess共享cookie实现对个人网页的爬取
print(resp.text)

Library handles requests by untrusted SSL certificate
SSL certificate: business license similar to
adding methods requested to verify = False

import requests
url = 'https://inv-veri.chinatax.gov.cn/index.html'
resp = requests.get(url,verify=False)
print(resp.content.decode('utf-8'))
Released four original articles · won praise 0 · Views 352

Guess you like

Origin blog.csdn.net/Mr_Little_li/article/details/104244896