Python: Capture urllib.request timeout exception of two methods

Python: Capture urllib.request timeout exception of two methods


1. Background

  When using urllib.request.urlopen, frequent timeout exception causes the program to stop running. Each stop also restart the program, which is not conducive to the robustness of the program. Now we want to capture urllib of a timeout exception processing to do overtime.

from urllib import request

headers = {  # 用户代理,伪装浏览器用户访问网址
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3941.4 Safari/537.36'
}


# 测试url是否有效
def test_url(url):
    r = request.Request(url, headers=headers)
    r1 = request.urlopen(r, timeout=0.1)
    print(r1.status)


if __name__ == '__main__':
    url1 = 'https://www.baidu.com/'
    url2 = 'http://httpbin.org/get'
    url3 = 'https://www.jianshu.com/p/5d6f1891354f'
    test_url(url2)

Timeout error 1

2 timeout exception

2. Methods

2.1 except Exception as e

from urllib import request

headers = {  # 用户代理,伪装浏览器用户访问网址
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3941.4 Safari/537.36'
}


# 测试url是否有效
def test_url(url):
    try:
        r = request.Request(url, headers=headers)
        r1 = request.urlopen(r, timeout=0.1)
        print(r1.status)
    except Exception as e:  # 捕获除与程序退出sys.exit()相关之外的所有异常
        print(e)


if __name__ == '__main__':
    url1 = 'https://www.baidu.com/'
    url2 = 'http://httpbin.org/get'
    url3 = 'https://www.jianshu.com/p/5d6f1891354f'
    test_url(url2)

operation result

2.2 except error.URLError as e

from urllib import request, error
import socket

headers = {  # 用户代理,伪装浏览器用户访问网址
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3941.4 Safari/537.36'
}


# 测试url是否有效
def test_url(url):
    try:
        r = request.Request(url, headers=headers)
        r1 = request.urlopen(r, timeout=0.1)
        print(r1.status)
    except error.HTTPError as e:
        print(str(e.code) + ':' + e.reason)
    except error.URLError as e:
        print(e.reason)
        if isinstance(e.reason, socket.timeout):
            print('时间超时')


if __name__ == '__main__':
    url1 = 'https://www.baidu.com/'
    url2 = 'http://httpbin.org/get'
    url3 = 'https://www.jianshu.com/p/5d6f1891354f'
    test_url(url2)

operation result

Note 3

  1. When the test url1, Baidu fast response, usually run the program without problems.
url1 = 'https://www.baidu.com/'

operation result

  1. When the test url2, the program can really capture a timeout exception.
url2 = 'http://httpbin.org/get'

  Method 1 Run Results:
Method 1 Run results
  Method 2 Run Results:
Method 2 Run results

  1. When the test url3, 1 can catch a timeout exception, Method 2 error exit. The reason may be in poor network status, caused by excessive application for web content.
url3 = 'https://www.jianshu.com/p/5d6f1891354f'

  Method 1 Run Results:
Method 1 Run results
  Method 2 Run Results:
Method 2 Run results

4 Summary

  "Except error.URLError as e" can only capture a timeout associated with urllib exception, "except Exception as e" can capture all of a timeout exception.

Published 77 original articles · won praise 25 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_34801642/article/details/103887853