Python网络爬虫：User Agent和代理IP - 代码天地

Python网络爬虫：User Agent和代理IP

其他 2018-06-20 05:33:11 阅读次数: 2

一、在urllib2中的使用：

# 一：
# 异常处理，及设置请求次数
# 可添加time时间间隔
import urllib2
def download(url,num_retries=2):
	print("Downloading:",url)
	try:
		html = urllib2.urlopen(url).read()
	except urllib2.URLError as e:
		print("Download error:",e.reason)
		html = None
		if num_retries>0:
			if hasattr(e,"code") and 500 <= e.code <600:
				# 当服务器响应结果为5XX时重新请求
				return download(url,num_retries-1)
	return html

# 二：在一的基础上添加用户代理
import urllib2
def download(url,uesr_agent="wswp",num_retries=2):
	print("Downloading:",url)
	headers = {"User_agent":uesr_agent}
	requset = urllib2.Request(url,headers=headers)
	try:
		html = urllib2.urlopen(requset).read()
	except urllib2.URLError as e:
		print("Download error:",e.reason)
		html = None
		if num_retries>0:
			if hasattr(e,"code") and 500 <= e.code <600:
				# 当服务器响应结果为5XX时重新请求
				return download(url,num_retries-1)
	return html

# 三、支持代理
import urllib2
def download(url,uesr_agent="wswp",proxy=None,num_retries=2):
	print("Downloading:",url)
	headers = {"User_agent":uesr_agent}
	requset = urllib2.Request(url,headers=headers)
	opener = urllib2.build_opener()
	if proxy:
		proxy_params = {urlparse.urlparse(url).scheme:proxy}
		opener.add_handler(urllib2.ProxyHandler(proxy_params))
	try:
		html = urllib2.urlopen(requset).read()
	except urllib2.URLError as e:
		print("Download error:",e.reason)
		html = None
		if num_retries>0:
			if hasattr(e,"code") and 500 <= e.code <600:
				# 当服务器响应结果为5XX时重新请求
				return download(url,num_retries-1)
	return html

二、在requests中的使用：

import requests
headers={"User_agent":uesr_agent}
proxies= {
    "http":"http://127.0.0.1:9999",
    "https":"http://127.0.0.1:8888"
}
response  = requests.get("https://www.baidu.com",headers=headers,proxies=proxies)
print(response.text)

更多requests的用法，参考：https://www.cnblogs.com/zhaof/p/6915127.html

猜你喜欢

转载自blog.csdn.net/weixin_41601173/article/details/80019778

Python网络爬虫：User Agent和代理IP

python爬虫学习之路(4) User Agent和代理IP

User-Agent和代理IP的使用

Python3网络爬虫(四)：使用User Agent和代理IP隐藏身份

User Agent与代理IP

Python爬虫伪装，请求头User-Agent池，和代理IP池搭建使用

使用User Agent和代理IP隐藏身份

scrapy user-agent和IP 代理的设置

python3爬虫系列19之反爬随机 User-Agent 和 ip代理池的使用

爬虫-User-Agent和代理池

python爬虫User Agent用户代理

动态ip代理：反网络爬虫之设置User-Agent的常规方法

python爬虫之反爬虫（随机user-agent，获取代理ip，检测代理ip可用性）

爬虫之随机User-Agent及IP代理池

Python3-使用User Agent和代理IP隐藏身份

网络爬虫的User-Agent和Proxy

随机生User-Agent代理Ip

网络爬虫——常用39个User-Agent代理

这些User-Agent和代理IP的使用套路你是否知道，来进行反爬？

IP代理中间件和user-agent中间件的编写

爬虫（Python)用户代理User-Agent设置

python基于scrapy框架的网络爬虫程序反爬虫机制之User-Agent伪装

常见的User-Agent及免费代理IP网站

scrapy中自定义下载中间件设置动态User-Agent和代理ip

Python3网络爬虫——（2）设置User Agent模拟浏览器访问

Python网络爬虫&模块介绍：fake-useragent模块快速生成User-Agent信息

python爬虫工具：user agent switcher

Python 爬虫更改User-Agent

python 爬虫随机换user-agent

Python爬虫动态User-Agent

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)