Python reptile commonly used middleware Scrapy

I. Overview

  1. The role of middleware

          Throughout scrapy running, do some adaptation actions on their own projects some of the steps scrapy framework to run.

     For example scrapy built HttpErrorMiddleware, you can do some processing error in the http request.

       2. The use of middleware

          Configuring settings.py. See scrapy document https://doc.scrapy.org

 

Second, the middleware classification

  scrapy middleware in theory, there are three (Schduler Middleware, Spider Middleware, Downloader Middleware), on the application of the following two general

       1. reptiles middleware Spider Middleware

         The main function is to deal with some of the reptiles during operation.

  2. Download middleware Downloader Middleware

         The main function after the request to a Web page, some processing when the page is downloaded.

 

Third, the use

      1.Spider Middleware has the following functions are managed:

       - process_spider_input receiving and processing a response object,

         Position Downloader -> process_spider_input -> Spiders (Downloader and Spiders are scrapy official configuration components in the figures)

       - process_spider_exception spider appear abnormal when called

       - process_spider_output When Spider response processing result returned when the method is called

       - process_start_requests When spider request, is called

    Position is Spiders -> process_start_requests -> Scrapy Engine (Scrapy Engine scrapy official is a configuration diagram of the components)         

   2.Downloader Middleware has the following functions are managed

   - When process_request request by downloading middleware, the method is called

   - process_response download results through middleware be processed by this method

- called when an exception occurs during the download process_exception

      When writing middleware, you need to think about the function to be implemented in the most appropriate course of treatment, which will write method.

      Middleware can be used to process the request, or a combination of signal processing results using some method of coordination, etc. You can also add other functions to adapt the project in the original reptiles, which can also be extended in writing the purpose, in fact, more to expand coupling of the recommended extension.

Guess you like

Origin blog.csdn.net/sinat_38682860/article/details/93522766