Two kinds of errors often appear in crawlers
URL Error
url error is very common, that is, the url address is incorrect or invalid.
Let’s take a look at how the code is implemented.
# -*- coding: utf-8 -*-
"""
@ auth : carl_DJ
@ time : 2020-8-20
"""
from urllib import request
from urllib import error
#定义一个不存在的url 地址
url = 'https://www.baidu1.com'
#获取返回结果
req = request.Request(url)
#追加断言
try:
response = request.urlopen(requst)
html = response.read().decode('utf-8')
print(html)
except error.URLError as e:
print(e.reason)
The result after execution:
[Errno 11001] getaddrinfo failed
This execution result means: get address error
HTTP Error
# -*- coding: utf-8 -*-
"""
@ auth : carl_DJ
@ time : 2020-8-20
"""
from urllib import request
from urllib import error
#定义一个不存在的url 地址
url = 'https://www.baidu1.com'
#获取返回结果
req = request.Request(url)
#追加断言
try:
response= urllib.request.urlopen(req)
# html = response.read()
except urllib.error.HTTPError as e:
print(f"HTTP Error is :" ,e.code)
Similarly, look at the running results:
HTTP Error is : 403
#code attribute indicates HTTPError
#reason attribute indicates URLError
#403 ⇒The request was rejected by the server
If the HTTP status code is not clear, you can read Xiaoyu's " HTTP Status Code "
Mixed use of HTTP Error and URL Error
Next, let's take a look at what happens when both are used.
Old rules, code:
# -*- coding: utf-8 -*-
# @ auth : carl_DJ
# @ time : 2020-8-20
"""
如果httperr 和urlerr混用,那么需要把HTTPerr放在 urlerr的前面,
因为 httperr 是URlerr 一个子类
可以使用hasstattr函数来判断urlerr含有的属性:
code属性表明 httperror
reason 属性表明 urlerror
"""
from urllib import error
from urllib import request
#定义一个不存在的url地址
url = "http://www.douyu.com/Jack_Cui.html"
req = request.Request(url)
print("----URLError错误信息-------")
try:
response = request.urlopen(req)
html = response.read().decode('utf -8')
print(html)
except error.URLError as e:
print("URLError:%s" %e.reason)
print("\n" )
print("----HTTPError错误信息-------")
try:
response = request.urlopen(req)
except error.HTTPError as a:
print("HTTPError:%s" %a.code)
print("\n" )
print("----URLError和HTTPError混合使用-------")
try:
response = request.urlopen(url)
except error.URLError as s:
if hasattr(s,'code'):
print("HTTPError")
print(s.code)
elif hasattr(s,'reason'):
print("URLError")
print(s.reason)
Take a look at the results: