Use of Python's Requests library
Requests module
Requests is a powerful Python HTTP library for sending HTTP requests and getting response data. It simplifies interaction with HTTP resources, providing a concise and easy-to-use API.
Documentation:https://requests.readthedocs.io/projects/cn/zh_CN/latest/
To use the Requests module, you first need to install it, you can use pip to install the latest version of Requests
pip install requests
pip3 install requests
easy to use
Once installed, you can import the Requests module in your Python code and use it.
Send a GET request
Send a GET request using Requests:
# 导入模块
import requests
# 目标url
url = 'https://www.baidu.com'
# 向目标url发送get请求
response = requests.get(url)
# 打印响应内容
print(response.text)
# 解码:解决中文乱码问题
print(response.content.decode())
Send a POST request
Use Requests to send POST requests:
# 导入模块
import requests
# 定义请求地址
url = 'http://127.0.0.1:8080/login'
# 定义自定义请求头
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
# 定义post请求参数
data = {
"username": "admin",
"password": "123456"
}
# 使用 POST 请求参数发送请求
response = requests.post(url, headers=headers, data=data)
# 获取响应的 html 内容
html = response.content.decode("utf-8")
print(html)
Use the Response response object
To get the response content you can use
response.text
orresponse.content
response.text是将response.content(bytes类型)进行解码的字符串。
response.content是直接从网络上抓取的数据,没有经过任何解码,是一个 bytes类型的数据。
Decoding needs to specify an encoding method. If the server does not specify it, requests will encode the response according to the HTTP header. If
<meta charset="utf-8">
guessing, the default encoding is "ISO-8859-1". If the guess is wrong, it will cause garbled characters in decoding. Therefore, it is necessary to useresponse.content.decode()
to solve Chinese garbled characters
The decode function solves Chinese garbled characters
Common Coded Character Sets
utf-8
gbk
gb2312
ascii
iso-8859-1
response.content.decode() 默认utf-8
response.content.decode("GBK")
common properties or methods
response = requests.get(url):response是发送请求获取的响应对象
response.text、response.content:获取响应内容
response.url响应的url:有时候响应的url和请求的url并不一致
response.status_code:响应状态码
response.request.headers:响应对象的请求头
response.headers:响应头
response.request._cookies:响应对应请求的cookie,返回cookieJar类型
response.cookies:应中携带的cookies,经过set-cookie动作,返回cookieJar类型
response.json():自动将json字符串类型的响应内容转换为python对象(dict or list)
Set headers request header
可以使用headers参数来设置请求头
headers参数用于携带请求头发送请求的方法
headers参数接收字典形式的请求头,请求头字段名作为key,字段对应的值作为value
import requests
# 目标url
url = 'https://www.baidu.com'
# 请求头
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"}
# 向目标url发送get请求
response = requests.get(url, headers=headers)
# 打印响应内容
print(response.content.decode())
# 打印请求头信息
print(response.request.headers)
Handling Cookies
1. Carry cookies
Carry cookies in the headers parameter.
从浏览器中复制User-Agent和Cookie
浏览器中的请求头字段和值与headers参数中必须一致
headers请求参数字典中的Cookie键对应的值是字符串
# 请求头
import requests
# 构造请求头字典
headers = {
# 浏览器中复制的User-Agent
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',
# 浏览器中复制的Cookie
'Cookie': 'BIDUPSID=DA34A47255629CF319B6868F08DC207F; PSTM=1658846527; BAIDUID=DA34A47255629CF32D59A4FD90F6BB95:SL=0:NR=10:FG=1;'
}
url = 'https://www.baidu.com/s'
# 请求参数 字典
params = {
'wd': 'java'}
# 向目标url发送get请求
response = requests.get(url, headers=headers, params=params)
# 打印响应内容
print(response.content.decode())
2. Cookies parameters
Cookies can be carried in the headers parameter, or special cookies parameters can be used. Cookies generally have an expiration time, and once expired, they need to be retrieved again
The cookies parameter is a dictionary form:
cookies = {
"cookie的name":"cookie的value"}
Use of cookies parameters:
# 构造cookies字典
cookies_str = '浏览器中复制的cookies字符串'
cookies_dict = {
cookie.split('=')[0]: cookie.split('=')[-1] for cookie in cookies_str.split('; ')}
# 请求头参数字典中携带cookie字符串
response = requests.get(url, headers=headers, cookies=cookies_dict)
3. cookieJar object
The resposne object obtained using requests has the cookies attribute. The attribute value is a cookieJar type, which contains the local cookie set by the server.
cookie operation
# 返回 RequestsCookieJar对象
cookies = response.cookies
# RequestsCookieJar 转 cookies字典
requests.utils.dict_from_cookiejar(cookies)
# cookies字典 转 RequestsCookieJar
requests.utils.cookiejar_from_dict()
# 对cookie进行操作,把一个字典添加到cookiejar中
requests.utils.add_dict_to_cookiejar()
set timeout
Use the timeout parameter to set the request timeout (in seconds).
import requests
url = 'url'
# 设置超时时间,发送请求后,3秒钟内返回响应,否则就抛出异常
response = requests.get(url, timeout=3)
Send a request with parameters
Carry parameters in url
import requests
# 目标url
url = 'https://www.baidu.com/s?wd=java'
# 请求头
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"}
# 向目标url发送get请求
response = requests.get(url, headers=headers)
# 打印响应内容
print(response.content.decode())
Carry parameters through params
Build the request parameter dictionary, bring the parameter dictionary when sending the request to the interface, and set the parameter dictionary to params
# 请求头
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"}
url = 'https://www.baidu.com/s'
# 请求参数 字典
params = {
'wd': 'java'}
# 向目标url发送get请求
response = requests.get(url, headers=headers, params=params)
# 打印响应内容
print(response.content.decode())
proxy agent
overview
In the Requests module, a proxy server proxy can be used to send HTTP requests. A proxy acts as a middleman between you and the target server, forwarding requests and responses along the way. Using a proxy can be used for a variety of purposes, such as hiding your real IP address, bypassing network restrictions, and more.
By specifying the proxy IP, the proxy server can forward the sending request. This approach is called a forward proxy, it acts as a middleman between the client and the target server, the proxy server receives the request from the client, forwards it to the target server, and then returns the response to the client.
The difference between forward proxy and reverse proxy:
1. Forward proxy
Those who forward the request for the party that sends the request (browser or client) will know the real address of the server that finally processes the request, such as VPN
2. Reverse proxy
The request is not forwarded for the party that sends the request (browser or client), but for the server that finally processes the request, without knowing the real address of the server, such as nginx
Agent classification
1. According to the degree of anonymity of proxy IP, proxy IP can be divided into three categories:
1. Transparent Proxy:
Although a transparent proxy can directly hide your IP address, it can still find out who you are.
The request headers received by the target server are as follows:
REMOTE_ADDR = Proxy IP
HTTP_VIA = Proxy IP
HTTP_X_FORWARDED_FOR = Your IP
2. Anonymous Proxy:
Using an anonymous proxy, others can only know that you use a proxy, but cannot know who you are.
The request headers received by the target server are as follows:
REMOTE_ADDR = proxy IP
HTTP_VIA = proxy IP
HTTP_X_FORWARDED_FOR = proxy IP
3. High Anonymity Proxy (Elite proxy or High Anonymity Proxy):
The high-anonymity agent makes it impossible for others to find that you are using an agent, so it is the best choice. There is no doubt that using a high-profile proxy works best.
The request headers received by the target server are as follows:
REMOTE_ADDR = Proxy IP
HTTP_VIA = not determined
HTTP_X_FORWARDED_FOR = not determined
2. Depending on the protocol used by the website, it is necessary to use the proxy service of the corresponding protocol
The protocols used to serve requests from proxies can be categorized as:
http代理:目标url为http协议
https代理:目标url为https协议
socks隧道代理,例如socks5代理:
socks 代理只是简单地传递数据包,不关心是何种应用协议(FTP、HTTP和HTTPS等)
socks 代理比http、https代理耗时少
socks 代理可以转发http和https的请求
The use of proxies proxy parameters
In order to make the server think that it is not the same client requesting, in order to prevent frequent requests to a domain name from being blocked, the proxy ip needs to be used
# 构造proxies字典
proxies = {
"http": "http://ip:端口",
"https": "https://ip:端口",
}
response = requests.get(url, proxies=proxies)
Notice:
If the proxies dictionary contains multiple key-value pairs, the corresponding proxy ip will be selected according to the protocol of the url address when sending the request
Other functional services
Ignore CA certificate
When the browser accesses certain URLs, it will prompt:
您的连接不是私密连接
, this is because the CA certificate of the website has not been受信任的根证书颁发机构
certified. When the request is executed,ssl.CertificateError
an exception containing the words etc. is thrown.
import requests
url = "url "
# 设置忽略证书 verify参数设置为False表示不验证CA证书
response = requests.get(url,verify=False)
picture download
When downloading pictures, the suffix name is the same as the requested suffix name, and response.content must be used to save the file
import requests
# 下载图片地址
url = "https://pic.netbian.com/uploads/allimg/180826/113958-153525479855be.jpg"
# 发送请求获取响应
response = requests.get(url)
# 保存图片
with open('image.png', 'wb') as f:
f.write(response.content)
retry processing
The retrying module can monitor a function through the decorator mode, and if the function throws an exception, it will trigger a retry operation
Install the retrying module
pip install retrying
# 导入模块
import time
import requests
# 使用第三方模块 retrying 模块
from retrying import retry
# 使用装饰器进行重试设置
# stop_max_attempt_number 表示重试次数
@retry(stop_max_attempt_number=3)
def test():
print("Test 重试次数")
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
url = "http://127.0.0.1:8888"
# 设置超时参数
response = requests.get(url, headers=headers, timeout=1)
return response.text
if __name__ == '__main__':
try:
html = test()
except Exception as e:
print(e)
time.sleep(10)
Test 重试次数
Test 重试次数
Test 重试次数
HTTPConnectionPool(host='127.0.0.1', port=8888): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x000001CF901742B0>, 'Connection to 127.0.0.1 timed out. (connect timeout=1)'))
session state is kept
The Session class in the requests module can automatically process the cookies generated during the process of sending requests and obtaining responses, so as to achieve the purpose of state preservation.
After the session instance requests a website, the local cookie set by the other party's server will be saved in the session, and the next time the session is used to request the other party's server, the previous cookie will be brought
The parameters sent by the session object to get or post requests are exactly the same as those sent by the requests module
# 实例化session对象
session = requests.session()
# 一次请求
response = session.get(url, headers)
# 下一次请求
response = session.post(url, data)