What are the functions of Scrapy? What are its architectures?

  As we all know, Python has many frameworks that can be used, such as Scrapy. It is a fast, high-level screen scraping and web scraping framework for Python, which is used to scrape web sites and extract structured data from web pages. It has a wide range of uses, so do you know what Scrapy does? What is the architecture of Scrapy?

  Scrapy is an application framework suitable for crawling website data and extracting structured data. It can be used in a wide range of fields. Scrapy is often used in a series of programs including data mining, information processing and storing historical data. Usually we can It is very simple to implement a crawler through the Scrapy framework to crawl the content or pictures of the specified website.

  What is the architecture of Scrapy?

  Scrapy Engine: Responsible for communication, information and data transfer among Spider, itemPipeline, Downloader, and Scheduler;

  Scheduler: Responsible for receiving the Request requests sent by the engine, sorting them in a certain way, enqueuing them, and returning them to the engine when needed;

  Downloader: Responsible for downloading all Requests sent by the Scrapy Engine, and returning the Responses it gets to the Scrapy Engine, which will be handed over to the Spider for processing;

  Spider: Responsible for processing Responses, analyzing and extracting data from them, obtaining the data required by the Item field, submitting the URL that needs to be followed up to the engine, and entering the Scheduler again;

  Item Pipeline: The place responsible for processing the item obtained in the Spider and performing post-processing;

  Downloader Middlewares: A component that can customize and extend download functions;

  Spider Middlewares: A functional component that can be customized to extend and operate the communication between the engine and the Spider.

Guess you like

Origin blog.51cto.com/15052541/2643378