from selenium import webdriver import os import re class GetPage: def __init__(self, url_path): self.url_path = url_path self.driver = webdriver.Chrome() self.urls = {} self.url_flag = False self.driver.set_page_load_timeout(1) self.driver.set_script_timeout(1) def get_url(self): if os.path.exists(self.url_path): with open(self.url_path, 'r') as f: url = f.read() self.urls = re.split(',', url) print(self.urls) if len(self.urls): self.url_flag = True else: print(self.url_path + " no exist") def close(self): self.driver.quit() def get_page(self): self.get_url() if self.url_flag: for url in self.urls: try: self.driver.get(url) except: print(url + " timeout") self.driver.quit() self.driver = webdriver.Chrome() self.close() if __name__ == "__main__": get_url_list = GetPage("E:\\1.txt") get_url_list.get_page() ———————————————— 原文链接:https://blog.csdn.net/weixin_31315135/article/details/91039752
selenium中,当我们一次性要爬取很多url时,当get()页面超时后,捕获异常后,还需要继续get()其他url页面,但是当你直接调用get()方法时,
会报异常。此时解决方法有两种,一种是重启浏览器,另一种是浏览器保持有两个tag页,当超时是切换到另一个tag(注意:tag页是很容易加载的)
python3 selenium 超时停止加载,并且捕捉异常, 进行下一步【亲测有效】
猜你喜欢
转载自www.cnblogs.com/stvadv/p/11653406.html
今日推荐
周排行