About Requests agent, you need to know
Speaking of agents, small reptiles wrote partners must not unfamiliar. But your agent really begin to take effect ?
Acting is divided into the following categories:
If it is, then reptiles, the most common choice is a high-hiding proxy.
Requests to set the proxy is very convenient, just pass a parameter to proxies. As official Example:
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
Pay attention to a place, proxies dictionary has two key: https and http, why write two key, if only one can it?
Try it, and then you'll know
To prepare the verification function
This function uses a proxy to access the IP verification two sites, one is https, one is http.
import requests
from bs4 import BeautifulSoup
def validate(proxies):
https_url = 'https://ip.cn'
http_url = 'http://ip111.cn/'
headers = {'User-Agent': 'curl/7.29.0'}
https_r = requests.get(https_url, headers=headers, proxies=proxies, timeout=10)
http_r = requests.get(http_url, headers=headers, proxies=proxies, timeout=10)
soup = BeautifulSoup(http_r.content, 'html.parser')
result = soup.find(class_='card-body').get_text().strip().split('''\n''')[0]
print(f"当前使用代理:{proxies.values()}")
print(f"访问https网站使用代理:{https_r.json()}")
print(f"访问http网站使用代理:{result}")
test
Case 1
proxies = { 'http': '222.189.244.56:48304', 'https': '222.189.244.56:48304' } validate(proxies)
Export
当前使用代理:dict_values(['222.189.244.56:48304', '222.189.244.56:48304']) 访问https网站使用代理:{'ip': '222.189.244.56', 'country': '江苏省扬州市', 'city': '电信'} 访问http网站使用代理:222.189.244.56 China / Nanjing
Results: visit two sites are using a proxy
Case 2
proxies = { 'http': '222.189.244.56:48304' } validate(proxies)
Export
当前使用代理:dict_values(['222.189.244.56:48304']) 访问https网站使用代理:{'ip': '118.24.234.46', 'country': '重庆市', 'city': '腾讯'} 访问http网站使用代理:222.189.244.56 China / Nanjing
Results: Only http requests using a proxy
Case 3
proxies = { 'https': '222.189.244.56:48304' } validate(proxies)
Export
当前使用代理:dict_values(['222.189.244.56:48304']) 访问https网站使用代理:{'ip': '222.189.244.56', 'country': '江苏省扬州市', 'city': '电信'} 访问http网站使用代理:118.24.234.46 China / Nanning
Results: Only https requests using a proxy
Other tests
By wireshark capture it found that when the agreement do not match, they will not initiate a request to the proxy server.
By postman test results are consistent with Requests, under different protocols, the agent will not go.
Speculation may be a convention or rule, similar to the PAC? (If you know the answer, please let me know)
in conclusion
Requests in time **, the agent must have the same protocol (http / https) and the target Web site, to take effect. **