About Requests agent, you need to know

About Requests agent, you need to know

Speaking of agents, small reptiles wrote partners must not unfamiliar. But your agent really begin to take effect ?

Acting is divided into the following categories:

Classifieds

If it is, then reptiles, the most common choice is a high-hiding proxy.

Requests to set the proxy is very convenient, just pass a parameter to proxies. As official Example:

import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

requests.get('http://example.org', proxies=proxies)

Pay attention to a place, proxies dictionary has two key: https and http, why write two key, if only one can it?

Try it, and then you'll know

To prepare the verification function

This function uses a proxy to access the IP verification two sites, one is https, one is http.

import requests
from bs4 import BeautifulSoup


def validate(proxies):
    https_url = 'https://ip.cn'
    http_url = 'http://ip111.cn/'
    headers = {'User-Agent': 'curl/7.29.0'}
    https_r = requests.get(https_url, headers=headers, proxies=proxies, timeout=10)
    http_r = requests.get(http_url, headers=headers, proxies=proxies, timeout=10)
    soup = BeautifulSoup(http_r.content, 'html.parser')
    result = soup.find(class_='card-body').get_text().strip().split('''\n''')[0]

    print(f"当前使用代理:{proxies.values()}")
    print(f"访问https网站使用代理:{https_r.json()}")
    print(f"访问http网站使用代理:{result}")

test

  • Case 1

    proxies = {
        'http': '222.189.244.56:48304',
        'https': '222.189.244.56:48304'
    }
    validate(proxies)

    Export

    当前使用代理:dict_values(['222.189.244.56:48304', '222.189.244.56:48304'])
    访问https网站使用代理:{'ip': '222.189.244.56', 'country': '江苏省扬州市', 'city': '电信'}
    访问http网站使用代理:222.189.244.56 China / Nanjing

    Results: visit two sites are using a proxy

  • Case 2

    proxies = {
        'http': '222.189.244.56:48304'
    }
    validate(proxies)

    Export

    当前使用代理:dict_values(['222.189.244.56:48304'])
    访问https网站使用代理:{'ip': '118.24.234.46', 'country': '重庆市', 'city': '腾讯'}
    访问http网站使用代理:222.189.244.56 China / Nanjing

    Results: Only http requests using a proxy

  • Case 3

    proxies = {
        'https': '222.189.244.56:48304'
    }
    validate(proxies)

    Export

    当前使用代理:dict_values(['222.189.244.56:48304'])
    访问https网站使用代理:{'ip': '222.189.244.56', 'country': '江苏省扬州市', 'city': '电信'}
    访问http网站使用代理:118.24.234.46 China / Nanning

    Results: Only https requests using a proxy

Other tests

By wireshark capture it found that when the agreement do not match, they will not initiate a request to the proxy server.

By postman test results are consistent with Requests, under different protocols, the agent will not go.

Speculation may be a convention or rule, similar to the PAC? (If you know the answer, please let me know)

in conclusion

Requests in time **, the agent must have the same protocol (http / https) and the target Web site, to take effect. **

Reference links

Guess you like

Origin www.cnblogs.com/ljz-2014/p/11387488.html