Core Principles of Scrapy Asynchronous Framework


Preface

The Scrapy framework implements asynchronous crawlers and improves efficiency

1. What is the Scrapy framework?

Scrapy is an asynchronous crawler tool, which was created to solve the problem of crawling multiple URL addresses to achieve asynchronous crawling.

Second, the principle of Scrapy asynchronous framework

1. The concept of synchronization and asynchrony

Insert picture description hereInsert picture description here

2. Principle of Scrapy asynchronous framework

Insert picture description hereScrapyEngine: Scrapy engine
Spiders: The crawler file created by the Scrapy engine .
Scheduler: Scheduler, which receives requests from spiders and distributes them to the downloader in a unified manner. In addition, it can also de- duplicate , integrate URL queues, and also de-
duplicate Downloader: downloaders, which are received from The request request of the scheduler, and the response object returned to spider
DownloaderMiddlewares: downloader middleware
ItemPipeline: IO persistence operation

to sum up

Here is a summary of the article:
This article only briefly introduces the principles of the scrapy framework, and will provide a lot of actual project content in the follow-up.

Guess you like

Origin blog.csdn.net/weixin_42961082/article/details/109854410