爬虫——Python爬虫遇到ip被封或Max retries exceeded问题 - 代码天地

爬虫——Python爬虫遇到ip被封或Max retries exceeded问题

其他 2021-03-05 07:33:32 阅读次数: 0

（1）IP被封
解决方法：User Agent+IP代理
具体方法：
User Agent减少IP被封次数，原理是模仿人的点击访问。
具体做法：加上headers={'user-agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.133 Safari/569.36'}
这里使用的是模仿谷歌浏览器的，其他浏览器可以参见这篇文章Python3网络爬虫(四)：使用User Agent和代理IP隐藏身份，其实没有什么区别，都只是个模仿过程。
然后在requests.get(url)这里，加上header的属性，变成requests.get(url,headers = headers)
IP代理解决IP被封情况，原理是减少机器识别出某个ip在短时间内经常访问某个网页
具体做法：找到免费或者收费的代理商，使用它们的接口获得ip地址，然后放在proxies = {}中，在requests.get(url,headers = headers)这里加上proxies属性，变成requests.get(url,headers = headers,proxies=proxies)
注意事项：
网上免费的ip代理中的ip基本都不可用，100多个里面顶多2个可以用，极其浪费时间。
一些收费的ip代理也出现ip不可用的情况，可以在购买之前先试用，看看效果如何再购买。
如果爬的是国外的网站，也有一些服务商提供国外ip代理，但是如果需要先使用VPN再使用IP代理的话，则ip代理没有用。
（2）Max retries exceeded（443）问题
解决方法：sleep+retries或session或IP代理
具体方法：
session减少打开的http连接的数量，在requests.get(url, headers=headers)前，添加如下代码

s = requests.session()
s.keep_alive = False

sleep+retries增加连接上的可能，

time.sleep(5)
requests.DEFAULT_RETRIES = 5

ip代理，换个ip，继续爬

猜你喜欢

转载自blog.csdn.net/zeshen123/article/details/109639084

爬虫——Python爬虫遇到ip被封或Max retries exceeded问题

工作问题--------爬虫遇到requests.exceptions.ConnectionError: HTTPSConnectionPool Max retries exceeded

Max retries exceeded with url 错误

错误：Max retries exceeded with url

scrapy::Max retries exceeded with url

爬虫时候遇到python connection error max retries exceeded whith url 怎么解决？

Python requests“Max retries exceeded with url” error

python 关于Max retries exceeded with url 的错误

python Max retries exceeded with URL in requests

爬虫Max retries exceeded with url和403错误

python爬虫 requests异常：requests.exceptions.ConnectionError: HTTPSConnectionPool Max retries exceeded

python requests报Max retries exceeded with url异常

数据处理后的缓存清除问题：Max retries exceeded with url

requests.exceptions.SSLError……Max retries exceeded with url错误求助！！！

HTTPConnectionPool（host:XX）Max retries exceeded with url 解决方法

错误：requests.exceptions.SSLError: None: Max retries exceeded with url:

Python爬取知乎回答信息碰到：Max retries exceeded with URL

Python3中关于Max retries exceeded with url 的错误解决

requests.exceptions.SSLError: HTTPSConnectionPool(host='xxx'): Max retries exceeded with url

使用requests爬取报错“Max retries exceeded with url“的解决方法

【elasticsearch】Failed Elasticsearch bulk request: request retries exceeded max retry timeout

Max retries exceeded with url: /v2/info (Caused by SSLError(SSLError(“bad handshake: SysCallError(10

requests客户端 Max retries exceeded with url (Caused by NewConnectionError(urllib3.connection...)

Requests报错：requests.exceptions.SSLError: HTTPSConnectionPool 和 Max retries exceeded with url

HTTPSConnectionPool(host=‘vndb.ong‘, port=443)：Max retries exceeded with Url: /r181459 (Caused by SS

ERROR OGG-01224 TCP/IP error 110 (Connection timed out); retries exceeded.

转 OGG-01224 TCP/IP error 111 (Connection refused); retries exceeded.

Sqoop错误retries get failed due to exceeded maximum allowed retries number

HTTPConnectionPool(host='xx.xx.xx.xx', port=xx): Max retries exceeded with url:(Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000015A25025EB8>...))

python requests max_retries 设置最大重试次数

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)