scrapy process module function - transfer

Scrapy workflow and specific role of each module

 

Which process may be described as follows:
  1. Url starting crawlers configured to request object -> intermediate reptiles -> Engine -> scheduler
  2. The scheduler to request -> Engine -> Download middleware ---> Downloader
  3. Sends a request to download, in response to acquisition response ----> ---- downloaded middleware> Engine ---> --- crawler middleware> reptiles
  4. Extracting the url crawler, assembled into a request object ----> crawler middleware ---> --- Engine> scheduler, repeat steps 2
  5. Reptile extract data ---> Engine ---> pipeline to process and store data

note:
  • Reptiles and downloaded middleware middleware logical operation just different positions, action is repeated: The other alternative UA

summary

  1. scrapy concept: Scrapy is a website for crawling data, extract structured data application framework written
  2. scrapy frame data transfer process and operating procedure:
    1. Url starting crawlers configured to request object -> intermediate reptiles -> Engine -> scheduler
    2. The scheduler to request -> Engine -> Download middleware ---> Downloader
    3. Sends a request to download, in response to acquisition response ----> ---- downloaded middleware> Engine ---> --- crawler middleware> reptiles
    4. Extracting the url crawler, assembled into a request object ----> crawler middleware ---> --- Engine> scheduler, repeat steps 2
    5. Reptile extract data ---> Engine ---> pipeline to process and store data
  3. Scrapy action framework: fast grab small amount of code
  4. Each module scrapy master role: engine (Engine): is responsible for transmitting data and signals in the low back pain is not among the scheduler module (Scheduler): implement a queue request sent, stored engine downloading request object (Downloader): Send rEQUEST request sent over the engine, in response to acquisition, and in response to engine crawlers (spider): response sent, the processing engine, data extraction, extraction url, engine and to the pipe (pipeline): processing data transmitted over the engine, such as storing the downloaded middleware (downloader middleware): downloading extension that can be customized, such as setting agents ip crawler middleware (spider middleware): can customize the response request for requesting and filtration was repeated action middleware downloads

Guess you like

Origin www.cnblogs.com/jamnoble/p/10945598.html