Correct use of HTTP proxy

HTTP proxy is a very common protocol for web crawlers, and HTTP proxy protocol is also an indispensable part of the big data era. HTTP proxies have played a lot of uses in web crawlers. HTTP proxy actually has many uses, such as ticket swiping, crawling, grabbing orders, swiping orders, and a series of businesses are suitable for HTTP proxy. In fact, for the work of web crawlers, many web workers do not know how to use HTTP proxies. So how can we use HTTP proxy correctly?

In the era of big data, restrictions on various websites, restrictions on crawlers, restrictions on access, etc. make it impossible to access and obtain data, which will have a great impact on crawler users. This is because web crawler users will run HTTP proxies to complete these tasks. Web crawlers need to collect a large amount of data in a short period of time, so they need to run HTTP proxy IP to avoid website anti-crawling and website IP restrictions. Use the web crawler program to access the HTTP proxy and collect data directly.

#! -*- encoding:utf-8 -*-

import requests

# The target page to visit
targetUrl = " http://ip.hahado.cn/ip "

# Proxy server
proxyHost = " http://ip.hahado.cn "
proxyPort = "39010"

# Proxy tunnel authentication information
proxyUser = "username"
proxyPass = "password"

proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
"host" : proxyHost,
"port" : proxyPort,
"user" : proxyUser,
"pass" : proxyPass,
}

proxies = {
"http" : proxyMeta,
"https" : proxyMeta,
}

resp = requests.get(targetUrl, proxies=proxies)

print resp.status_code

print resp.text

Guess you like

Origin blog.csdn.net/weixin_73725158/article/details/130146832