Python crawler requests teaching (5): common exception handling

Garbled text on the webpage

The reason for the garbled code is because there is no setting how to encode the webpage during the decoding process

response.encoding = response.apparent_encoding

Python crawler, data analysis, website development and other case tutorial videos are free to watch online

https://space.bilibili.com/523606542 

Python learning exchange group: 1039645993

Request header parameters

InvalidHeader: Invalid return character or leading space in header: User-Agent

import requests

headers = {
  'User-Agent': ' Mozilla/5.0 (windows NT 10.0; wow64) Applewebkit/537.36(KHTML,like Gecko) chrome/84.0.4128.3 safari/537.36'
}
response = requests.get( ' http: //www.shuquge.com/txt/8659/index.htm1 ' ,
headers=headers)
response.encoding = response.apparent_encoding
html = response.text
print(htm7)

In fact, it is difficult to find out where the problem is, but in fact it is because there is an extra space before'Mozilla', just delete the space

No data & parameter error

import requests

headers = {
  'Host' : 'www.guazi. com ' ,
  'User-Agent ': 'Mozi11a/5.0 (windows NT 10.0; wOw64) ApplewebKit/537.36(KHTML,like Gecko) chrome/84.0.4128.3 safari/537.36',
}
response = requests.get( ' https: //www.guazi.com/cs/20e17311773b1706x.htm',
headers=headers)
response.encoding = response.apparent_encoding
print(response.text)

The requested data is not the same as the expected data. At this time, there must be a problem with some parameters. Check whether the parameter is missing or the wrong parameter is given.

The target computer actively refuses

import requests

proxy_response = requests.get( 'http://134.175.188.27:5010/get')
proxy = proxy_response.json()
print(proxy)

error

requests.exceptions.ConnectionError: 
HTTPConnectionPoo1(host='134.175.188.27',port=5010): 
Max retries exceeded with url: /get (caused byNewConnectionError( ' <ur1lib3.connection.HTTPConnection object at Ox0000023AB83AC828>: Failed to establish a new connection: [winError 10061]由于目标计算机积极拒绝,无法连接。',))
  • Is recognized
  • The URL is entered incorrectly
  • The server stopped providing the server

Link timeout

import requests

proxy_response = requests.get( ' http://134.175.188.27:5010/get', timeout=0.0001)
proxy = proxy_response.json(
print(proxy)

error

requests.exceptions.connectTimeout: 
HTTPConnectionPoo1(host='134.175.188.27'port=5010): 
Max retries exceeded with ur1: /get (caused byconnectTimeoutError(<ur1lib3.connection.HTTPConnection object at ox000002045EF9B8DO>,'Connection to 134.175.188.27 timed out.(connecttimeout=O.0001) '))

Exception handling

import requests

try :
  proxy_response = requests.get( 'http:/ /134.175.188.27:5010/get',timeout=O.0001)
  proxy = proxy_response.json()
  print(proxy)
except:
  pass

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/115249380