Scrapy-redis组件去重

一、安装

pip3 install -i https://pypi.douban.com/simple scrapy-redis

二、配置文件

scrapy 去重

DUPEFILTER_KEY = 'dupefilter:%(timestamp)s'
DUPEFILTER_CLASS = 'scrapy_redis.dupefilter.RFPDupeFilter'

scrapy连接redis

REDIS_HOST = 'ip'                            
REDIS_PORT = 端口号                                   
REDIS_PARAMS  = {'password':'密码'}                                 
REDIS_ENCODING = "utf-8"# REDIS_URL = 'redis://user:密码@ip:端口'   (优先于以上配置)

三、自定义类

通过继承RFPDupeFilter和重写from_settings方法,设置默认的key

class RedisDupeFilter(RFPDupeFilter):
    @classmethod
    def from_settings(cls, settings):
        server = get_redis_from_settings(settings)
        key = defaults.DUPEFILTER_KEY % {'timestamp': '固定的key''}
        debug = settings.getbool('DUPEFILTER_DEBUG')
        return cls(server, key=key, debug=debug)

配置文件修改DUPEFILTER_CLASS的路径即可

猜你喜欢

转载自www.cnblogs.com/wt7018/p/11756393.html
今日推荐