How the Scrapy crawler framework works

official:

 translate

Scrapy Engine (engine) : Responsible for communication, signal, data transmission, etc. among Spider, ItemPipeline, Downloader, and Scheduler.

Scheduler (scheduler) : It is responsible for accepting the Request sent by the engine, sorting it in a certain way, entering the queue, and returning it to the engine when the engine needs it.

Downloader : Responsible for downloading all the Requests sent by the Scrapy Engine (engine), and returning the obtained Responses to the Scrapy Engine (engine), which is handed over to the Spider for processing.

Spider (crawler) : It is responsible for processing all Responses, analyzing and extracting data from them, obtaining the data required by the Item field, submitting the URL that needs to be followed up to the engine, and entering the Scheduler (scheduler) again.

Item Pipeline (pipeline) : It is responsible for processing the items obtained in the Spider and performing post-processing (detailed analysis, filtering, storage, etc.).

Downloader Middlewares : A component that can customize and extend the download function.

Spider Middlewares (Spider middleware) : A functional component that can customize the extension and operation engine and the middle communication between the Spider.

 

 

Guess you like

Origin blog.csdn.net/weixin_46310452/article/details/126035686