Data collection: Let scrapy-redis support of start_urls priority

scrapy-redisBy default only supports the redis listand setdata structures, but when faced with more business after taking into account the need reptiles task priority issue. For example, currently has three business lines also need to use a reptile, the importance of the three lines of business are not the same , then there are several options:

  • Open three spider (not recommended)
  • Add the scheduler for scheduling priority (the added complexity)
  • Let scrapy-redis's start_urlssupport the priority

Was also faced with the problem, adding a layer uses a scheduler to run, then take the time to scrapy-redisprovide support for start_urlspriority feature, by settings.pysetting parameters can be supported, the test has passed, the project may be too busy author , there is no feedback to the PR.

project address

https://github.com/qshine/scrapy-redis

Instructions

git clone https://github.com/qshine/scrapy-redis.git
cd scrapy-redis
python setup.py install

In settings.pysetting this parameter, other parameters may refer to README

# settings.py
......

REDIS_URL = 'redis://:@127.0.0.1:6379'
REDIS_START_URLS_KEY = '%(name)s:start_urls'
REDIS_START_URLS_AS_SET = False

......

Test spideras follows

# -*- coding: utf-8 -*-

from scrapy_redis.spiders import RedisSpider


class MysiteSpider(RedisSpider):
    name = 'mysite'

    def parse(self, response):
        print(response.url)

To redisadd three different priority task

zadd mysite:start_urls 0 'a' 10 'b' 5 'c'

Start spider, log is as follows

http://www.sina.com
2019-07-03 23:54:34 [mysite] DEBUG: Request not made from data: b'http://www.sina.com'
http://www.163.com
2019-07-03 23:54:34 [mysite] DEBUG: Request not made from data: b'http://www.163.com'
http://www.baidu.com
2019-07-03 23:54:34 [mysite] DEBUG: Request not made from data: b'http://www.baidu.com'

Epilogue

This featureis the latest to address priority submitted personally think that is a more practical function. If there are less than welcome to share, if you can help achieve rapid demand, also welcomed the click star.

Guess you like

Origin www.cnblogs.com/zlone/p/11129953.html