Exception handling of crawlers (018)

1: Common status codes and their meanings

301 Moved Permanently: Redirect to new URL, permanently

302 Found: redirect to a temporary URL, not permanent

304 Not Modified: The requested resource has not been updated

400 Bad Request: Illegal request

401 Unauthorized: The request is not authorized

403 Forbidden: Forbidden

404 Not Found: The corresponding page was not found

500 Internet Serve Error : An error occurred inside the server

501 Not Implemented: The server does not support the functionality required to implement the request

Two: URLError and HTTPError in exception handling

Both are exception handling classes, HTTPError is a subclass of URLError, HTTPError has exception status code and exception reason, and URLError has no exception status code. Therefore, URLError cannot be used directly instead of HTTPError in exception handling. If you want to replace it, you must determine whether there is a status code attribute.

Three: actual combat operation

URLError occurs: 1: Unable to connect to the server. 2: The remote url does not exist. 3: There is no local network. 4: A subclass of HTTPError is fired



At this time, the network disconnection error is returned, and the crawler did not crash. If it appears red, it will crash.

Fourth, the above code is:

import urllib.error
import urllib.request
try:
       urllib.request.urlopen("http://blog.csdn.net")
except urllib.error.URLError as e:
       if hasattr(e,"code"):
           print(e.code)
       if hasattr(e,"reason"):
           print(e.reason)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325548969&siteId=291194637