爬虫中经常出现Traceback (most recent call last):问题解决!!!

问题描述:

当用快速爬取某网站出现经常出现Traceback (most recent call last):的错误,也就是连接失败。原因首先是快速爬取连接时网络不稳定造成的,于是写了个多次尝试连接的函数。

 

错误界面:

Traceback (most recent call last):

  File "E:/pycharm/PycharmProjects/爬虫/BG5.py", line 118, in <module>

    main(j)

  File "E:/pycharm/PycharmProjects/爬虫/BG5.py", line 84, in main

    response1 = getHTMLText(data[j][0])

  File "E:/pycharm/PycharmProjects/爬虫/BG5.py", line 54, in getHTMLText

    response = requests.get(url, headers=kv, timeout=60)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\api.py", line 75, in get

    return request('get', url, params=params, **kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\api.py", line 60, in request

    return session.request(method=method, url=url, **kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\sessions.py", line 533, in request

    resp = self.send(prep, **send_kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\sessions.py", line 646, in send

    r = adapter.send(request, **kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\adapters.py", line 516, in send

    raise ConnectionError(e, request=request)

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.wzfg.com', port=80): Max retries exceeded with url: /realweb/stat/ProjectListHouseAll.jsp?status=&projectid=9001708&permitNo=%E7%91%9E%E5%AE%89%E5%B8%82%E5%94%AE%E8%AE%B8%E5%AD%97(2017)%E7%AC%AC010%E5%8F%B7 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000000000D42E208>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。',))

解决方法:

def getHTMLText(url):
    maxTryNum = 20
    for tries in range(maxTryNum):
        try:
            kv = {"user-agent": "Mizilla/5.0"}
            response = requests.get(url, headers=kv, timeout=60)
            return response.text
        except:
            if tries < (maxTryNum - 1):
                continue
            else:
                print("Has tried %d times to access url %s, all failed!", maxTryNum, url)
                break

猜你喜欢

转载自blog.csdn.net/weixin_40096730/article/details/89508665
今日推荐