Hello everyone, this section mainly talks about URLError and HTTPError, and some handling of them.
1.URLError
First explain the possible causes of URLError:
<!--[if !supportLists]--> · <!--[endif]--> No connection to the network, that is, the machine cannot access the Internet
<!--[if !supportLists]--> · <!--[endif]-->Can not connect to a specific server
<!--[if !supportLists]--> · <!--[endif]--> Server does not exist
In the code, we need to surround and catch the corresponding exception with try-except statement. The following is an example, first feel its coquettish
Python
1 2 3 4 5 6 7 |
importurllib2
requset = urllib2.Request('http://www.zhimaruanjian.com') try: urllib2.urlopen(requset) excepturllib2.URLError,e: printe.reason |
We used the urlopen method to access a non-existing URL, and the results are as follows:
Python
1 |
[Errno11004]getaddrinfo failed |
It shows that the error code is 11004 and the cause of the error is getaddrinfo failed
2.HTTPError
HTTPError is a subclass of URLError. When you use the urlopen method to make a request, the server will correspond to a response object response, which contains a numerical "status code". For example, if the response is a "redirect", you need to locate another address to get the document, and urllib2 will handle this.
For other things that cannot be processed, urlopen will generate an HTTPError, corresponding to the corresponding status. The HTTP status code indicates the status of the response returned by the HTTP protocol. The status codes are summarized as follows:
100: Continue The client should continue sending the request. The client SHOULD continue sending the remainder of the request, or ignore this response if the request has already completed.
101: Switching Protocols After sending the blank line at the end of this response, the server will switch to those protocols defined in the Upgrade header. Similar measures should only be taken when it is more beneficial to switch to a new protocol.
102: Continue Processing Status code extended by WebDAV (RFC 2518) indicating that processing is to be continued.
200: The request is successful. Processing method: Get the content of the response and process it
201: The request completed, resulting in the creation of a new resource. The URI of the newly created resource is available in the entity of the response Handling: not encountered in the crawler
202: The request is accepted, but the processing has not yet been completed. Processing method: blocking waiting
204: The request has been fulfilled by the server, but no new information has been returned. If the client is a user agent , it does not need to update its own document view for this. Processing method: discard
300: This status code is not directly used by HTTP/1.0 applications, but is only used as the default interpretation for 3XX type responses. There are multiple requested resources available. Processing method: If it can be processed in the program, it will be further processed. If it cannot be processed in the program, it will be discarded. 301: The requested resource will be assigned a permanent URL, so that the resource can be accessed through the URL in the future. Processing method: Redirect to assigned URL
302: The requested resource is temporarily saved at a different URL Processing method: redirect to a temporary URL
304: The requested resource is not updated Processing method: discard
400: Illegal request Processing method: discard
401: Unauthorized Processing method: discard
403: Forbidden Processing method: discard
404: Not found Processing method: discard
500: Internal Server Error The server encountered an unexpected condition that prevented it from completing the processing of the request. Generally, this problem occurs when there is an error in the source code on the server side .
501: The server does not recognize the server does not support a function required by the current request. When the server does not recognize the requested method and cannot support its request for any resource.
502: Bad Gateway An invalid response was received from an upstream server when a server working as a gateway or proxy attempted to perform a request.
503: Service Error The server is currently unable to process the request due to temporary server maintenance or overload. This condition is temporary and will return after a period of time.
HTTPError实例产生后会有一个code属性,这就是是服务器发送的相关错误号。
因为urllib2可以为你处理重定向,也就是3开头的代号可以被处理,并且100-299范围的号码指示成功,所以你只能看到400-599的错误号码。
下面我们写一个例子来感受一下,捕获的异常是HTTPError,它会带有一个code属性,就是错误代号,另外我们又打印了reason属性,这是它的父类URLError的属性。
Python
1 2 3 4 5 6 7 8 |
import urllib2
req = urllib2.Request('http://www.zhimaruanjian.com') try: urllib2.urlopen(req) except urllib2.HTTPError, e: print e.code print e.reason |
运行结果如下
Python
1 2 |
403 Forbidden |
错误代号是403,错误原因是Forbidden,说明服务器禁止访问。
我们知道,HTTPError的父类是URLError,根据编程经验,父类的异常应当写到子类异常的后面,如果子类捕获不到,那么可以捕获父类的异常,所以上述的代码可以这么改写
Python
1 2 3 4 5 6 7 8 9 10 11 |
import urllib2
req = urllib2.Request('http://www.zhimaruanjian.com') try: urllib2.urlopen(req) except urllib2.HTTPError, e: print e.code except urllib2.URLError, e: print e.reason else: print "OK" |
如果捕获到了HTTPError,则输出code,不会再处理URLError异常。如果发生的不是HTTPError,则会去捕获URLError异常,输出错误原因。
另外还可以加入 hasattr属性提前对属性进行判断,代码改写如下
Python
1 2 3 4 5 6 7 8 9 10 11 12 |
import urllib2
req = urllib2.Request('http://www.zhimaruanjian.com') try: urllib2.urlopen(req) except urllib2.URLError, e: if hasattr(e,"code"): print e.code if hasattr(e,"reason"): print e.reason else: print "OK" |
首先对异常的属性进行判断,以免出现属性输出报错的现象。
以上,就是对URLError和HTTPError的相关介绍,以及相应的错误处理办法,小伙伴们加油!