Python crawler skips exception handling

Python crawler skips exception handling

Recently, I need to use crawlers to crawl some pictures, but I often encounter errors as follows:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.xxxxxx.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0334D290>: Failed to establish a new connection: [WinError 10060]
//或者下面这个报错
urllib.error.HTTPError: HTTP Error 403: Forbidden

Approach

In fact, due to an unstable network connection or being forced to close the connection by the website as an attack, our crawler program may fail to run. The following is a method that I personally implemented by using exception handling to continue running after an error occurs:

import requests
import time
import json
from tqdm import tqdm

url = 'https://www.xxxx.com' #需求爬取的url

for item in tqdm(items): # items是需要爬取的list,tqdm是一个简易进度条,不想看进度条可以直接for item in items:
	From_data = {
    
    'text': item} # 传入requests的数据
	while True: # 相当于一直请求,直到成功后break
		try:
			response = requests.post(url, data=From_data) # 如果这个请求失败,那么并不会运行break,将运行except中的代码
			break # 能运行到这一步说明请求成功,可以跳出循环了
		except:
			time.sleep(1) # 暂停1秒,之后回到while True继续请求
			print('---------sleep----------')
	content = json.loads(response.text)# 跳出循环后对得到的数据处理

To sum up, it is to use while True and an exception will not continue to run down, forming a cycle of continuous requests until success, and at the same time eat up the error report, will not interrupt the program, and do not need to sleep for 1 second every time, the theoretical speed faster.

Similar modifications can be made to your own crawler, and the general structure is as follows: just replace the request and data processing with your own.

for item in items: 
	while True:
		try:
			response = requests.post(url, data=From_data) #请求
			break 
		except:
			time.sleep(1) 
			print('---------sleep----------')
	content = json.loads(response.text)# 对得到的数据处理

Guess you like

Origin blog.csdn.net/qq_45551930/article/details/119512630