Python3, HTTP Error, URL Error and mixed use of crawlers

Two kinds of errors often appear in crawlers

URL Error

url error is very common, that is, the url address is incorrect or invalid.
Let’s take a look at how the code is implemented.

# -*- coding: utf-8 -*-
"""
@ auth : carl_DJ
@ time : 2020-8-20
"""

from urllib import request
from urllib import error

#定义一个不存在的url 地址
url = 'https://www.baidu1.com'
#获取返回结果
req = request.Request(url)

#追加断言
try:
	response = request.urlopen(requst)
	html = response.read().decode('utf-8')
	print(html)
except error.URLError as e:
	print(e.reason)
	

The result after execution:

[Errno 11001] getaddrinfo failed

This execution result means: get address error

HTTP Error

# -*- coding: utf-8 -*-
"""
@ auth : carl_DJ
@ time : 2020-8-20
"""

from urllib import request
from urllib import error

#定义一个不存在的url 地址
url = 'https://www.baidu1.com'
#获取返回结果
req = request.Request(url)

#追加断言
try:
    response= urllib.request.urlopen(req)
    # html = response.read()

except urllib.error.HTTPError as e:
    print(f"HTTP Error is :" ,e.code)

Similarly, look at the running results:

HTTP Error is : 403

#code attribute indicates HTTPError
#reason attribute indicates URLError
#403 ⇒The request was rejected by the server

If the HTTP status code is not clear, you can read Xiaoyu's " HTTP Status Code "

Mixed use of HTTP Error and URL Error

Next, let's take a look at what happens when both are used.
Old rules, code:

# -*- coding: utf-8 -*-
# @ auth : carl_DJ
# @ time : 2020-8-20

"""
如果httperr 和urlerr混用,那么需要把HTTPerr放在 urlerr的前面,
因为 httperr 是URlerr 一个子类

可以使用hasstattr函数来判断urlerr含有的属性:
code属性表明 httperror
reason 属性表明 urlerror

"""

from urllib import error
from urllib import request

#定义一个不存在的url地址
url = "http://www.douyu.com/Jack_Cui.html"

req = request.Request(url)

print("----URLError错误信息-------")
try:
    response = request.urlopen(req)
    html = response.read().decode('utf -8')
    print(html)
except error.URLError as e:
    print("URLError:%s" %e.reason)
    print("\n" )


print("----HTTPError错误信息-------")
try:
    response = request.urlopen(req)
except error.HTTPError as a:
    print("HTTPError:%s" %a.code)
    print("\n" )

print("----URLError和HTTPError混合使用-------")
try:
    response = request.urlopen(url)
except error.URLError as s:

    if hasattr(s,'code'):
        print("HTTPError")
        print(s.code)
    elif hasattr(s,'reason'):
        print("URLError")
        print(s.reason)

Take a look at the results:
Insert picture description here

Guess you like

Origin blog.csdn.net/wuyoudeyuer/article/details/108127295