scrapy-redis in the queue url type of control (zset, list)

Description: scrapy-redis can be achieved during a data request url is automatically saved to the redis, but the type of data structure is stored in the queue and the selection priority setting profile hook.

Note: When saving type does not match the url and the url of extraction methods will complain

For example: zset extracted data in the extracted data by way of a data structure of list

In the list mode data extraction redis: lpush key

redis data extraction mode in zset: zrange key start end

(error) WRONGTYPE Operation against a key holding the wrong kind of value

A three priority queue:

# 指定排序爬取地址时使用的队列,
# 默认的 按优先级排序(Scrapy默认),由sorted set实现的一种非FIFO、LIFO方式。
# SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderPriorityQueue'
# 可选的 按先进先出排序(FIFO)
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'
# 可选的 按后进先出排序(LIFO)
# SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderStack'

Second, the priority queue corresponding to data type stored in redis the url

scrapy-redis default priority queue: zset

First In First Out (FIFO): list

Last-out (LIFO): list

Third, the choice of extraction method queue

REDIS_START_URLS_AS_SET = True

You can configure this line of code in setting the configuration file.

True: redis set to extract a set of data pattern extracting

False: redis extract data in a list a list of ways to extract

Guess you like

Origin blog.csdn.net/ryuhfxz/article/details/85782467
Recommended