Python爬取知乎回答信息碰到：Max retries exceeded with URL - 代码天地

Python爬取知乎回答信息碰到：Max retries exceeded with URL

其他 2019-03-11 20:25:06 阅读次数: 0

那天我在爬取知乎图片的时候碰到了这个问题。

开始我以为程序逻辑出错了，折腾了很久，知乎现在要爬取回答下面所有信息的话，就得翻页了，而获取翻页以及更多的信息就得考虑异步加载。

然后在浏览器里面找到了下一页的url

其中，next就是下一页的url，previous就是上一页的url，total：518是问题下回答的总数。

估计知乎对这些url的访问做了限制，虽然我弄了代理，但还是碰到了这个问题。

解决办法如下：

在requests库获取html时，如果碰到访问不成功，则用try-except加上循环继续访问，并用sleep控制访问频率

html = ""
    while html == "":      #因为请求可能被知乎拒绝，采用循环+sleep的方式重复发送，但保持频率不太高
        try:
            proxies = get_random_ip(ipList)
            print("这次试用ip：{}".format(proxies))
            r = requests.request("GET", url, headers=headers, params=querystring, proxies=proxies)
            r.encoding = 'utf-8'
            html = r.text
            return html
        except:
            print("Connection refused by the server..")
            print("Let me sleep for 5 seconds")
            print("ZZzzzz...")
            sleep(5)
            print("Was a nice sleep, now let me continue...")
            continue

问题到这里应该就解决了。

参考：Max retries exceed with URL （需要翻墙）

猜你喜欢

转载自blog.csdn.net/Morzker/article/details/77428051

Python爬取知乎回答信息碰到：Max retries exceeded with URL

Max retries exceeded with url 错误

错误：Max retries exceeded with url

scrapy::Max retries exceeded with url

Python requests“Max retries exceeded with url” error

python 关于Max retries exceeded with url 的错误

python Max retries exceeded with URL in requests

python requests报Max retries exceeded with url异常

requests.exceptions.SSLError……Max retries exceeded with url错误求助！！！

HTTPConnectionPool（host:XX）Max retries exceeded with url 解决方法

爬虫Max retries exceeded with url和403错误

错误：requests.exceptions.SSLError: None: Max retries exceeded with url:

使用requests爬取报错“Max retries exceeded with url“的解决方法

爬虫时候遇到python connection error max retries exceeded whith url 怎么解决？

Python3中关于Max retries exceeded with url 的错误解决

数据处理后的缓存清除问题：Max retries exceeded with url

requests.exceptions.SSLError: HTTPSConnectionPool(host='xxx'): Max retries exceeded with url

Max retries exceeded with url: /v2/info (Caused by SSLError(SSLError(“bad handshake: SysCallError(10

requests客户端 Max retries exceeded with url (Caused by NewConnectionError(urllib3.connection...)

Requests报错：requests.exceptions.SSLError: HTTPSConnectionPool 和 Max retries exceeded with url

HTTPSConnectionPool(host=‘vndb.ong‘, port=443)：Max retries exceeded with Url: /r181459 (Caused by SS

爬虫——Python爬虫遇到ip被封或Max retries exceeded问题

HTTPConnectionPool(host='xx.xx.xx.xx', port=xx): Max retries exceeded with url:(Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000015A25025EB8>...))

windows环境pip安装时一直报错Could not fetch URL https://pypi.org/simple/xrld/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url:

python爬虫 requests异常：requests.exceptions.ConnectionError: HTTPSConnectionPool Max retries exceeded

工作问题--------爬虫遇到requests.exceptions.ConnectionError: HTTPSConnectionPool Max retries exceeded

【elasticsearch】Failed Elasticsearch bulk request: request retries exceeded max retry timeout

Sqoop错误retries get failed due to exceeded maximum allowed retries number

ERROR OGG-01224 TCP/IP error 110 (Connection timed out); retries exceeded.

转 OGG-01224 TCP/IP error 111 (Connection refused); retries exceeded.

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)