- 使用管道必须实现process_item() 方法
-
process_item
(self, item, spider) -
次方法实现数据的过滤处理等操作
-
open_spider
(self, spider) -
开始运行爬虫是调用
-
close_spider
(self, spider) -
结束爬虫时调用
-
from_crawler
(cls, crawler) -
If present, this classmethod is called to create a pipeline instance from a
Crawler
. It must return a new instance of the pipeline. Crawler object provides access to all Scrapy core components like settings and signals; it is a way for pipeline to access them and hook its functionality into Scrapy.To activate an Item Pipeline component you must add its class to the
ITEM_PIPELINES
setting, like in the following example:ITEM_PIPELINES = { 'myproject.pipelines.PricePipeline': 300, 'myproject.pipelines.JsonWriterPipeline': 800, }