scrapy in the Request and Response objects

Preface:

  If the frame components than is made of the various organs of the human, then the Request and Response is blood, Item is a metabolite

 

Request object:

  It is used to describe an HTTP request, which is configured with a parameter

  1. url
    1. URL request
  2. callback
    1. Callback
  3. method
    1. The default is GET
  4. headers
    1. Dictionary Type
  5. body
  6. cookies
    1. Dictionary Type
  7. meta
    1. Request the metadata dictionary, dict type, used to frame the message is transmitted to other components, such as middleware Iten, Pipeline. Other components can use the meta attribute request to access the object parameters of the metadata dictionary
  8. encoding
  9. priority
    1. The default request priority is 0, the highest priority will be given priority download
  10. dont_filter
    1. The default is False, download task to submit a request for the same url address several times, will be behind the request to re-filter to filter out directly (to avoid duplication Resources). If this parameter is True, you can make a request to avoid being filtered. To force a download.
  11. errback
    1. When abnormal or HTTP request error when the callback function

  Although there are a lot of parameters, but in addition to other accidents url parameter is optional, with a default value. When constructing the Request object, usually we only need to pass a url parameter and callback parameters, the other is the default value can be used directly on it

 

Response object:

  HTTP is used to describe a corresponding, Response is simply a base class, there TextResponse, HTmlResponse, XmlResponse depending on the respective contents

  When a page has finished downloading, downloader create an object of a subclass of HTTP response according to Response Content-Type header information. We usually crawl content pages in HTML file is all that is created HtmlResponse, which HtmlResponse and XmlResponse is TextResponse subclasses. In fact, the three subclasses only minor differences

  HtmlResponse object attributes and methods:

    • url
      • url address of the HTTP response, str type
    • status
      • Status code of the HTTP response, int type
    • headers
      • Head, dictionary type of the HTTP response, can be accessed by get or method getlist
    • body
    • text
      • Text in the form of the HTTP response, str type, is decoded by using response.encoding obtained response.body
    • encoding
    • request
      • Request object generating the HTTP response
    • meta
        • response.request.meta, when constructing the Request object, it can be passed to the callback function parameter to be transmitted through meta parameter; callback function to process the response time may be performed by the values ​​response.meta
    • selector
      • Response to extract information
    • xpath
    • css
    • urljoin
      • For constructing an absolute URL, when an incoming URL parameter is a relative address when, according response.url calculate the corresponding absolute URL

 

Guess you like

Origin www.cnblogs.com/tulintao/p/11697844.html