scrapy frame (b)

A, scrapy selector

Overview:

Scrapy provide resolution mechanism based lxml library, they are called selectors.

Because they "choose" specified by the XPath expression or a CSS part of the HTML document.

Scarpy selector API is very small, and very simple.

Scrapy is selected by scrapy.Selector class instance, by passing TextResonse text objects or constructed.

Selector Selector object using

 Selector provides two methods to extract tag xpath ()   # based on the syntax rules xpath css () # based grammar css selector shortcut Selector = Response. Xpath ( '') Selector = Response. Css ( '') they returned object is to select a list of extracted text: selector. extract () returns the text list selector. extract_first () returns the first selector of the text, did not return None; you can set default sometimes we get several calls tag selection method (. XPath () or. CSS () ) Response. CSS ( 'IMG'). XPath ( '@src') Selector there is a. Re ()
 
  
   
 
 
 
 
 
   
 
 
 
 
 
 The method to use regular expressions to extract data. It returns a string. 
 It is generally used in XPath () , CSS () after the method used to filter the text data. 
 re_first () which returns the first matching string. 
 For example: 
 . Response XPath ( '// A [the contains (@href, "Image")] / text ()'). Re ( R & lt 'the Name:. \ S * (*)') 
 the contains () Fuzzy Match

Two, scrapy shell debugging tools

Description: Scrapy project code for debugging command line tool.

Start shell

 Start Scrapy shell command syntax is as follows: 
 scrapy shell [ the Option] [ url | File] Note: The analysis is sure to bring a local file path, scrapy shell as the default url

Using shell

 Scrapy shell is essentially an ordinary Python shell 
 only provides some of the objects you want to use, quick way for us to debug. Shortcut: Shelp () FETCH ( URL [, the redirect = True]) FETCH ( Request) View ( Response) Scrapy objects: content crawler Spider Request Response Settings

Three, scrapy.Spider

Spider class attributes, methods	description
The name attribute	The name of spider
start_urls property	Spiders start crawling the url list
customer_settings property	Custom settings
start_requests () method	Before the start of the request
parse(self, response)	The default callback function
from_crawler	Create a class method of spider

scrapy frame (b)

scrapy frame (b)

A, scrapy selector

Overview:

Two, scrapy shell debugging tools

Start shell

Using shell

Three, scrapy.Spider

Guess you like