python requests crawler returns 403 error? Solving problems that cannot be solved even by adding all the characteristic request headers and proxies

1. Problem analysis

[Doubt]: Using python's requests library to initiate a get or post request returns a 403 code error. Using postman to initiate a request found that the status code <200> was successful. What is the reason? First, eliminate the IP problem. If there is a problem with the IP, postman will not be able to access it. Is there a problem with the headers? Through comparison, I found that it is not a problem with the headers. Is that weird?

[Question Answer]: In fact, when encountering this situation, there is a high probability that it encounters "native simulated browser TLS/JA3 fingerprint verification". Both browsers and postman have their own fingerprint verification, but the requests library does not. This gives anti-crawling a breakthrough to distinguish between humans and crawlers.

2. Problem solving

1. Use the pyhttpx library (recommended)

1.1. Installation

pip install pyhttpx

1.2. Code example

import pyhttpx

headers = {
    
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
}
session = pyhttpx.HttpSession()
res = session.get(url='https://www.baidu.com/',headers=headers)
print(res.text)

2. Use curl_cffi library (less used)

2.1. Installation

pip install curl_cffi

2.2. Code examples

from curl_cffi import requests
res = requests.get(url='https://www.baidu.com/',impersonate="chrome101")
print(res.text)

3. Use httpx library (highly recommended)

3.1. Installation

pip install httpx

3.2. Code examples

import httpx

headers = {
    
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
}

res = httpx.get(url='https://www.baidu.com/', headers=headers, timeout=10, verify=False)
print(res.text)

Guess you like

Origin blog.csdn.net/SweetHeartHuaZai/article/details/130983179