One of the first thoughts is to write selenium in the process_request of the download middleware. Such as the following code.
middleware.py
from selenium import webdriver from scrapy.http import HtmlResponse class TestMiddleware(object): def __init__(self): self.driver = webdriver.Chrome() super().__init__() def process_request(self, request, spider): self.driver.get('xxx') return HtmlResponse(url=self.driver.current_url,body=self.driver.page_source,encoding='utf-8')
But there is a problem with this, the open selenium cannot be closed
Second, consider putting the driver in the spider.
The benefits are the following:
1 Not every spider needs to be downloaded with selenium
2 With multiple spiders running, opening selenium is equivalent to opening multiple processes.
like this
At present, the official recommendation award signal is bound to the crawler, and the class method from_crawler.
spider.py
class YunqiSpider(scrapy.Spider): name = ' yunqi ' def __init__(self): self.driver = webdriver.Chrome() super().__init__() dispatcher.connect(self.close_spider,signal=signals.spider_closed)
middleware.py
from scrapy.http import HtmlResponse class TestMiddleware(object): def process_request(self, request, spider): return HtmlResponse(url=spider.driver.current_url,body=spider.driver.page_source,encoding='utf-8')