scrapy 自定义扩展

1、新建一个扩展文件,定义一个类,必须包含from_crawler方法:

from scrapy import signals


class MyExtend:

    def __init__(self, crawler):
        self.crawler = crawler
        # 给钩子挂操作
        crawler.signals.connect(self.start, signals.engine_started)

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

    def start(self):
        # 自定义操作
        print('signals.engine_started')

2、设置settings

EXTENSIONS = {
    'day96.extensions.MyExtend': 300,
}

3、可以挂钩子的地方

# 引擎开始运行的时候
engine_started = object()
# 引擎结束运行的时候
engine_stopped = object()

spider_opened = object()
spider_idle = object()
spider_closed = object()
spider_error = object()
request_scheduled = object()
request_dropped = object()
response_received = object()
response_downloaded = object()

# yield Item的时候
item_scraped = object()
# Item丢弃的时候
item_dropped = object()

猜你喜欢

转载自www.cnblogs.com/trunkslisa/p/9814764.html