Crawl Tencent Comics

a summary

  The page is loaded asynchronously, and the url of each image is displayed only when the page slides. So selenium is recommended. At the same time, selenium is required to execute js purchasing to achieve the effect of page scrolling. It is the window.scrollTo() method.

  In the scrapy framework, not all requests need to go through selenium. After getting the data through selenium, return to the response, the homepage of a specific comic is only this requirement. Write this requirement into the download middleware and add conditional judgment.

  Specific reference: https://jiayi.space/post/scrapy-phantomjs-seleniumdong-tai-pa-chong

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324753503&siteId=291194637