Basic knowledge of python crawler - urlerror exception handling

1.URLError

First explain the possible causes of URLError:

  • No network connection, that is, the machine cannot access the Internet
  • Unable to connect to a specific server
  • server does not exist

In the code, we need to surround and catch the corresponding exception with try-except statement.


requset = urllib2.Request('http://www.xxxxx.com')
try:
    urllib2.urlopen(request)
except urllib2.URLError, e:
    print e.reason

#输出结果
1
[Errno 11004] getaddrinfo failed

 2.HTTPError

  • HTTPError is a subclass of URLError. When you use the urlopen method to issue a request, the server will correspond to a response object response, which contains a numeric "status code". For example, if the response is a "redirect", you need to locate another address to get the document, and urllib2 will handle this.
  • For other things that cannot be processed, urlopen will generate an HTTPError, corresponding to the corresponding status. The HTTP status code indicates the status of the response returned by the HTTP protocol. 
  • After the HTTPError instance is generated, there will be a code attribute, which is the relevant error number sent by the server.
  • Because urllib2 handles redirects for you, i.e. codes starting with 3 can be handled, and numbers in the range 100-299 indicate success, so you only see error numbers 400-599.

Let's write an example to get a feel for it. The caught exception is HTTPError, which will have a code attribute, which is the error code. In addition, we print the reason attribute, which is the attribute of its parent class URLError.

We know that the parent class of HTTPError is URLError. According to programming experience, the exception of the parent class should be written after the exception of the subclass. If the subclass cannot catch it, the exception of the parent class can be caught, so the above code can be rewritten like this

If an HTTPError is caught, the code will be output, and the URLError exception will not be handled. If the occurrence is not HTTPError, it will catch the URLError exception and output the reason of the error.

In addition, the hasattr attribute can also be added to judge the attribute in advance. The code is rewritten as follows

import urllib2
 
req = urllib2.Request('http://blog.csdn.net/cqcre')
try:
    urllib2.urlopen(req)
except urllib2.URLError, e:
    if hasattr(e,"reason"):
        print e.reason
else:
    print "OK"

Reprinted from Cui Qingcai's blog https://cuiqingcai.com/961.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324447587&siteId=291194637