Use of Python's Requests library

Requests module

Requests is a powerful Python HTTP library for sending HTTP requests and getting response data. It simplifies interaction with HTTP resources, providing a concise and easy-to-use API.

Documentation:https://requests.readthedocs.io/projects/cn/zh_CN/latest/

To use the Requests module, you first need to install it, you can use pip to install the latest version of Requests

pip install requests

pip3 install requests

easy to use

Once installed, you can import the Requests module in your Python code and use it.

Send a GET request

Send a GET request using Requests:

# 导入模块
import requests

# 目标url
url = 'https://www.baidu.com'

# 向目标url发送get请求
response = requests.get(url)

# 打印响应内容
print(response.text)

# 解码:解决中文乱码问题
print(response.content.decode())

Send a POST request

Use Requests to send POST requests:

# 导入模块
import requests

# 定义请求地址
url = 'http://127.0.0.1:8080/login'
# 定义自定义请求头
headers = {
    
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
# 定义post请求参数
data = {
    
    
    "username": "admin",
    "password": "123456"
}

# 使用 POST 请求参数发送请求
response = requests.post(url, headers=headers, data=data)
# 获取响应的 html 内容
html = response.content.decode("utf-8")
print(html)

Use the Response response object

To get the response content you can use response.textorresponse.content

response.text是将response.content(bytes类型)进行解码的字符串。

response.content是直接从网络上抓取的数据,没有经过任何解码,是一个 bytes类型的数据。

Decoding needs to specify an encoding method. If the server does not specify it, requests will encode the response according to the HTTP header. If <meta charset="utf-8">guessing, the default encoding is "ISO-8859-1". If the guess is wrong, it will cause garbled characters in decoding. Therefore, it is necessary to use response.content.decode()to solve Chinese garbled characters

The decode function solves Chinese garbled characters

Common Coded Character Sets

utf-8
gbk
gb2312
ascii
iso-8859-1
response.content.decode() 默认utf-8

response.content.decode("GBK")

common properties or methods

response = requests.get(url):response是发送请求获取的响应对象

response.text、response.content:获取响应内容

response.url响应的url:有时候响应的url和请求的url并不一致

response.status_code:响应状态码

response.request.headers:响应对象的请求头

response.headers:响应头

response.request._cookies:响应对应请求的cookie,返回cookieJar类型

response.cookies:应中携带的cookies,经过set-cookie动作,返回cookieJar类型

response.json():自动将json字符串类型的响应内容转换为python对象(dict or list

Set headers request header

可以使用headers参数来设置请求头

headers参数用于携带请求头发送请求的方法

headers参数接收字典形式的请求头,请求头字段名作为key,字段对应的值作为value
import requests

# 目标url
url = 'https://www.baidu.com'

# 请求头
headers = {
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"}

# 向目标url发送get请求
response = requests.get(url, headers=headers)

# 打印响应内容
print(response.content.decode())

# 打印请求头信息
print(response.request.headers)

Handling Cookies

1. Carry cookies

Carry cookies in the headers parameter.

从浏览器中复制User-Agent和Cookie

浏览器中的请求头字段和值与headers参数中必须一致

headers请求参数字典中的Cookie键对应的值是字符串
# 请求头
import requests

# 构造请求头字典
headers = {
    
    
    # 浏览器中复制的User-Agent
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',
    # 浏览器中复制的Cookie
    'Cookie': 'BIDUPSID=DA34A47255629CF319B6868F08DC207F; PSTM=1658846527; BAIDUID=DA34A47255629CF32D59A4FD90F6BB95:SL=0:NR=10:FG=1;'
}
url = 'https://www.baidu.com/s'

# 请求参数 字典
params = {
    
    'wd': 'java'}

# 向目标url发送get请求
response = requests.get(url, headers=headers, params=params)

# 打印响应内容
print(response.content.decode())

2. Cookies parameters

Cookies can be carried in the headers parameter, or special cookies parameters can be used. Cookies generally have an expiration time, and once expired, they need to be retrieved again

The cookies parameter is a dictionary form:

cookies = {
    
    "cookie的name":"cookie的value"}

Use of cookies parameters:

# 构造cookies字典
cookies_str = '浏览器中复制的cookies字符串'

cookies_dict = {
    
    cookie.split('=')[0]: cookie.split('=')[-1] for cookie in cookies_str.split('; ')}

# 请求头参数字典中携带cookie字符串
response = requests.get(url, headers=headers, cookies=cookies_dict)

3. cookieJar object

The resposne object obtained using requests has the cookies attribute. The attribute value is a cookieJar type, which contains the local cookie set by the server.

cookie operation

# 返回 RequestsCookieJar对象
cookies = response.cookies

# RequestsCookieJar 转 cookies字典
requests.utils.dict_from_cookiejar(cookies)

# cookies字典 转 RequestsCookieJar
requests.utils.cookiejar_from_dict()

# 对cookie进行操作,把一个字典添加到cookiejar中
requests.utils.add_dict_to_cookiejar()

set timeout

Use the timeout parameter to set the request timeout (in seconds).

import requests

url = 'url'
# 设置超时时间,发送请求后,3秒钟内返回响应,否则就抛出异常
response = requests.get(url, timeout=3)

Send a request with parameters

Carry parameters in url

import requests

# 目标url
url = 'https://www.baidu.com/s?wd=java'

# 请求头
headers = {
    
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"}

# 向目标url发送get请求
response = requests.get(url, headers=headers)

# 打印响应内容
print(response.content.decode())

Carry parameters through params

Build the request parameter dictionary, bring the parameter dictionary when sending the request to the interface, and set the parameter dictionary to params

# 请求头
import requests

headers = {
    
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"}

url = 'https://www.baidu.com/s'

# 请求参数 字典
params = {
    
    'wd': 'java'}

# 向目标url发送get请求
response = requests.get(url, headers=headers, params=params)

# 打印响应内容
print(response.content.decode())

proxy agent

overview

In the Requests module, a proxy server proxy can be used to send HTTP requests. A proxy acts as a middleman between you and the target server, forwarding requests and responses along the way. Using a proxy can be used for a variety of purposes, such as hiding your real IP address, bypassing network restrictions, and more.

By specifying the proxy IP, the proxy server can forward the sending request. This approach is called a forward proxy, it acts as a middleman between the client and the target server, the proxy server receives the request from the client, forwards it to the target server, and then returns the response to the client.

The difference between forward proxy and reverse proxy:

1. Forward proxy

Those who forward the request for the party that sends the request (browser or client) will know the real address of the server that finally processes the request, such as VPN

2. Reverse proxy

The request is not forwarded for the party that sends the request (browser or client), but for the server that finally processes the request, without knowing the real address of the server, such as nginx

Agent classification

1. According to the degree of anonymity of proxy IP, proxy IP can be divided into three categories:

1. Transparent Proxy:

Although a transparent proxy can directly hide your IP address, it can still find out who you are.

The request headers received by the target server are as follows:

REMOTE_ADDR = Proxy IP
HTTP_VIA = Proxy IP
HTTP_X_FORWARDED_FOR = Your IP

2. Anonymous Proxy:

Using an anonymous proxy, others can only know that you use a proxy, but cannot know who you are.

The request headers received by the target server are as follows:

REMOTE_ADDR = proxy IP
HTTP_VIA = proxy IP
HTTP_X_FORWARDED_FOR = proxy IP

3. High Anonymity Proxy (Elite proxy or High Anonymity Proxy):

The high-anonymity agent makes it impossible for others to find that you are using an agent, so it is the best choice. There is no doubt that using a high-profile proxy works best.

The request headers received by the target server are as follows:

REMOTE_ADDR = Proxy IP
HTTP_VIA = not determined
HTTP_X_FORWARDED_FOR = not determined

2. Depending on the protocol used by the website, it is necessary to use the proxy service of the corresponding protocol

The protocols used to serve requests from proxies can be categorized as:

http代理:目标url为http协议

https代理:目标url为https协议

socks隧道代理,例如socks5代理:
	socks 代理只是简单地传递数据包,不关心是何种应用协议(FTP、HTTP和HTTPS等)
	socks 代理比http、https代理耗时少
	socks 代理可以转发http和https的请求

The use of proxies proxy parameters

In order to make the server think that it is not the same client requesting, in order to prevent frequent requests to a domain name from being blocked, the proxy ip needs to be used

# 构造proxies字典
proxies = {
    
    
    "http": "http://ip:端口",
    "https": "https://ip:端口",
}

response = requests.get(url, proxies=proxies)

Notice:

If the proxies dictionary contains multiple key-value pairs, the corresponding proxy ip will be selected according to the protocol of the url address when sending the request

Other functional services

Ignore CA certificate

When the browser accesses certain URLs, it will prompt: 您的连接不是私密连接, this is because the CA certificate of the website has not been 受信任的根证书颁发机构certified. When the request is executed, ssl.CertificateErroran exception containing the words etc. is thrown.

import requests

url = "url "
# 设置忽略证书 verify参数设置为False表示不验证CA证书
response = requests.get(url,verify=False)

picture download

When downloading pictures, the suffix name is the same as the requested suffix name, and response.content must be used to save the file

import requests

# 下载图片地址
url = "https://pic.netbian.com/uploads/allimg/180826/113958-153525479855be.jpg"
# 发送请求获取响应
response = requests.get(url)
# 保存图片
with open('image.png', 'wb') as f:
    f.write(response.content)

retry processing

The retrying module can monitor a function through the decorator mode, and if the function throws an exception, it will trigger a retry operation

Install the retrying module

pip install retrying
# 导入模块
import time

import requests
# 使用第三方模块 retrying 模块
from retrying import retry


# 使用装饰器进行重试设置
# stop_max_attempt_number 表示重试次数
@retry(stop_max_attempt_number=3)
def test():
    print("Test 重试次数")
    headers = {
    
    
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
    }
    url = "http://127.0.0.1:8888"
    # 设置超时参数
    response = requests.get(url, headers=headers, timeout=1)

    return response.text


if __name__ == '__main__':
    try:
        html = test()
    except Exception as e:
        print(e)

    time.sleep(10)
Test 重试次数
Test 重试次数
Test 重试次数
HTTPConnectionPool(host='127.0.0.1', port=8888): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x000001CF901742B0>, 'Connection to 127.0.0.1 timed out. (connect timeout=1)'))

session state is kept

The Session class in the requests module can automatically process the cookies generated during the process of sending requests and obtaining responses, so as to achieve the purpose of state preservation.

After the session instance requests a website, the local cookie set by the other party's server will be saved in the session, and the next time the session is used to request the other party's server, the previous cookie will be brought

The parameters sent by the session object to get or post requests are exactly the same as those sent by the requests module

# 实例化session对象
session = requests.session() 
# 一次请求
response = session.get(url, headers)
# 下一次请求
response = session.post(url, data)

Guess you like

Origin blog.csdn.net/qq_38628046/article/details/129016909