反爬虫之设置IP代理

通过上一章节,了解到可以设置多个模拟浏览器代理来随机去访问获取页面内容,但是始终只有一个IP地址用来访问,时间长了同样也会被网站封掉.所以设置多个IP代理进行访问成为一种反爬虫的更好策略.即通过让其他的IP代替你的IP进行网站的访问。

如何获取代理IP?

https://www.xicidaili.com/ (西刺代理网站提供)

如何检测代理是否成功?

http://httpbin.org/get

示例

from urllib.request import ProxyHandler, build_opener, install_opener, urlopen
from urllib import  request




def use_proxy(proxies, url):
    # 1. 调用urllib.request.ProxyHandler
    proxy_support = ProxyHandler(proxies=proxies)
    # 2. Opener 类似于urlopen
    opener = build_opener(proxy_support)
    # 3. 安装Opener
    install_opener(opener)

    # user_agent =  "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"
    # user_agent =  "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"
    user_agent = 'Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3'
    # 模拟浏览器;
    opener.addheaders = [('User-agent', user_agent)]
    urlObj = urlopen(url)
    content = urlObj.read().decode('utf-8')
    return  content

if __name__ == '__main__':
    url = 'http://httpbin.org/get'
    proxies = {'https': "111.177.178.167:9999", 'http': '114.249.118.221:9000'}
    use_proxy(proxies, url)

猜你喜欢

转载自blog.csdn.net/qq_43279936/article/details/88134921