Article Directory
Preface
The Scrapy framework implements asynchronous crawlers and improves efficiency
1. What is the Scrapy framework?
Scrapy is an asynchronous crawler tool, which was created to solve the problem of crawling multiple URL addresses to achieve asynchronous crawling.
Second, the principle of Scrapy asynchronous framework
1. The concept of synchronization and asynchrony
2. Principle of Scrapy asynchronous framework
ScrapyEngine: Scrapy engine
Spiders: The crawler file created by the Scrapy engine .
Scheduler: Scheduler, which receives requests from spiders and distributes them to the downloader in a unified manner. In addition, it can also de- duplicate , integrate URL queues, and also de-
duplicate Downloader: downloaders, which are received from The request request of the scheduler, and the response object returned to spider
DownloaderMiddlewares: downloader middleware
ItemPipeline: IO persistence operation
to sum up
Here is a summary of the article:
This article only briefly introduces the principles of the scrapy framework, and will provide a lot of actual project content in the follow-up.