Turn: [Python3 crawlers develop practical handle exceptions 3.1.2]

Abstract previous section we understand the process of sending the request, but when the network is not good, if unusual, how to do it? Then if you do not handle these exceptions, the program is likely due to an error and terminates, so the exception handling is still very necessary.

urllib module defines the error exception is generated by the request module. If there is a problem, request module will throw an exception error defined in the module.

1. URLError

URLError urllib class library modules from error, which inherits from OSError class is the base class module error exception, the exceptions may be born by the request processing module captures the class.

It has a property reason, that is, return the wrong reasons.

With a look at the following example:


  
  
  1. from urllib import request, error
  2. try:
  3. response = request.urlopen( 'http://cuiqingcai.com/index.htm')
  4. except error.URLError as e:
  5. print(e.reason)

We open a page that does not exist, are supposed to be an error, but then we caught URLError this exception, the results are as follows:

Not Found
  
  

The program is not directly given, but the output of the above content, such as by the operation, we can avoid abnormal program termination, while the abnormality has been effectively treated.

2. HTTPError

It is URLError subclass, designed to handle HTTP request error, such as failure of the authentication request and the like. It has the following three properties.

code: returning an HTTP status code, such as 404 indicating that the page does not exist, the server 500 represents an internal errors.

reason: the same parent as the reason for the error is returned.

headers: Returns the request header.

Here we take a look at a few examples:


  
  
  1. from urllib import request,error
  2. try:
  3. response = request.urlopen( 'http://cuiqingcai.com/index.htm')
  4. except error.HTTPError as e:
  5. print(e.reason, e.code, e.headers, sep= '\n')

Results are as follows:


  
  
  1. Not Found
  2. 404
  3. Server: nginx/1.4.6 (Ubuntu)
  4. Date: Wed, 03 Aug 2016 08:54:22 GMT
  5. Content-Type: text/html; charset=UTF-8
  6. Transfer-Encoding: chunked
  7. Connection: close
  8. X-Powered-By: PHP/5.5.9-1ubuntu4.14
  9. Vary: Cookie
  10. Expires: Wed, 11 Jan 1984 05:00:00 GMT
  11. Cache-Control: no- cache, must-revalidate, max-age= 0
  12. Pragma : no - cache
  13. Link: < http://cuiqingcai.com/wp- json/>; rel="https://api.w.org/"

It remains the same URL, here captured HTTPError abnormal output reason, code and headers attributes.

Because URLError is HTTPError parent, it is possible to select the wrong subclass capture, capture go wrong parent class, the above code is better worded as follows:


  
  
  1. from urllib import request, error
  2. try:
  3. response = request.urlopen( 'http://cuiqingcai.com/index.htm')
  4. except error.HTTPError as e:
  5. print(e.reason, e.code, e.headers, sep= '\n')
  6. except error.URLError as e:
  7. print(e.reason)
  8. else:
  9. print( 'Request Successfully')

This can be done to capture HTTPError, get it wrong status code, reason, headers and other information. If not HTTPError abnormal, it will capture URLError abnormal output cause of the error. Finally, processing logic in the normal else. This is a good exception handling wording.

Sometimes, reason property returns is not necessarily a string, it could be a target. Look at the following examples:


  
  
  1. import socket
  2. import urllib.request
  3. import urllib.error
  4. try:
  5. response = urllib.request.urlopen('https: //www.baidu.com', timeout=0.01)
  6. except urllib.error. URLError the e:
  7. print(type(e.reason))
  8. if isinstance(e.reason, socket.timeout):
  9. print(' TIME OUT')

Here we set the timeout to force a direct throw timeout exception.

Results are as follows:


  
  
  1. < class 'socket.timeout'>
  2. TIME OUT

Can be found, the results of the reason property is socket.timeout class. So, here we can use the isinstance () method to determine its type, a more detailed abnormality judgment.

In this section, we describe the use of error-related modules can be made more accurate anomaly judgment by a reasonable catch the exception, make the program more robust.

Source: Huawei cloud community  Author: Cui Shu Jing Qing only seek

This article transferred from the old ape: https://blog.csdn.net/devcloud/article/details/94552903

Guess you like

Origin blog.csdn.net/LaoYuanPython/article/details/95305084