scrapy process module function - transfer
Others
2019-05-30 02:08:06
views: null
Scrapy workflow and specific role of each module
Which process may be described as follows:
- Url starting crawlers configured to request object -> intermediate reptiles -> Engine -> scheduler
- The scheduler to request -> Engine -> Download middleware ---> Downloader
- Sends a request to download, in response to acquisition response ----> ---- downloaded middleware> Engine ---> --- crawler middleware> reptiles
- Extracting the url crawler, assembled into a request object ----> crawler middleware ---> --- Engine> scheduler, repeat steps 2
- Reptile extract data ---> Engine ---> pipeline to process and store data
note:
- Reptiles and downloaded middleware middleware logical operation just different positions, action is repeated: The other alternative UA
summary
- scrapy concept: Scrapy is a website for crawling data, extract structured data application framework written
- scrapy frame data transfer process and operating procedure:
- Url starting crawlers configured to request object -> intermediate reptiles -> Engine -> scheduler
- The scheduler to request -> Engine -> Download middleware ---> Downloader
- Sends a request to download, in response to acquisition response ----> ---- downloaded middleware> Engine ---> --- crawler middleware> reptiles
- Extracting the url crawler, assembled into a request object ----> crawler middleware ---> --- Engine> scheduler, repeat steps 2
- Reptile extract data ---> Engine ---> pipeline to process and store data
- Scrapy action framework: fast grab small amount of code
- Each module scrapy master role: engine (Engine): is responsible for transmitting data and signals in the low back pain is not among the scheduler module (Scheduler): implement a queue request sent, stored engine downloading request object (Downloader): Send rEQUEST request sent over the engine, in response to acquisition, and in response to engine crawlers (spider): response sent, the processing engine, data extraction, extraction url, engine and to the pipe (pipeline): processing data transmitted over the engine, such as storing the downloaded middleware (downloader middleware): downloading extension that can be customized, such as setting agents ip crawler middleware (spider middleware): can customize the response request for requesting and filtration was repeated action middleware downloads
Origin www.cnblogs.com/jamnoble/p/10945598.html