1: Common status codes and their meanings
301 Moved Permanently: Redirect to new URL, permanently
302 Found: redirect to a temporary URL, not permanent
304 Not Modified: The requested resource has not been updated
400 Bad Request: Illegal request
401 Unauthorized: The request is not authorized
403 Forbidden: Forbidden
404 Not Found: The corresponding page was not found
500 Internet Serve Error : An error occurred inside the server
501 Not Implemented: The server does not support the functionality required to implement the request
Two: URLError and HTTPError in exception handling
Both are exception handling classes, HTTPError is a subclass of URLError, HTTPError has exception status code and exception reason, and URLError has no exception status code. Therefore, URLError cannot be used directly instead of HTTPError in exception handling. If you want to replace it, you must determine whether there is a status code attribute.
Three: actual combat operation
URLError occurs: 1: Unable to connect to the server. 2: The remote url does not exist. 3: There is no local network. 4: A subclass of HTTPError is fired
At this time, the network disconnection error is returned, and the crawler did not crash. If it appears red, it will crash.
Fourth, the above code is:
import urllib.error
import urllib.request
try:
urllib.request.urlopen("http://blog.csdn.net")
except urllib.error.URLError as e:
if hasattr(e,"code"):
print(e.code)
if hasattr(e,"reason"):
print(e.reason)