[Reptile] study notes day55 6.5 scrapy-redis source code analysis of official documents refer to:. Queue

. 6.5 scrapy-redis official document source analysis Reference: Queue

Here Insert Picture Description

queue.py

The document implements several containers, you can see these containers and redis interact frequently, using both our top defined picklecompat serializer. Several containers to achieve much the same this file, only one queue, a stack is, is a priority queue, these three containers that time will be scheduler object is instantiated to implement request scheduling. For example, we use the most SpiderQueue type of queue scheduling, time to request the scheduling method is FIFO, and practical SpiderStack is the last out of.

From realize SpiderQueue out, he would push function and other containers of the same, but the push into the request before the request is scrapy interface request_to_dict into a dict objects (request object because it is more complex, there are ways have attributes bad serialization), the serializer picklecompat after use into a serial string, and then use a specific key is stored in redis (the key is the same in the same spider). And when you call pop, in fact, read its value (a list) from redis with that particular key, go in the first reading that from a list, so he a FIFO. These containers are containers as scheduler scheduling request, a scheduler will be instantiated on each host, and the correspondence and the spider, the spider will have a plurality of instances of a plurality of distributed runtime instances and the presence of a scheduler on different hosts, however, since the scheduler is the same container, these containers are connected to the same redis server, and use the spider as the key name plus the write data queue, so that different crawlers on different hosts instances of a common a request dispatcher pool, to achieve a unified distributed between the reptile.

from scrapy.utils.reqser import request_to_dict, request_from_dict

from . import picklecompat


class Base(object):
    """Per-spider queue/stack base class"""

    def __init__(self, server, spider, key, serializer=None):
        """Initialize per-spider redis queue.
        Parameters:
            server -- redis connection
            spider -- spider instance
            key -- key for this queue (e.g. "%(spider)s:queue")
        """
        if serializer is None:
            # Backward compatibility.
            # TODO: deprecate pickle.
            serializer = picklecompat
        if not hasattr(serializer, 'loads'):
            raise TypeError("serializer does not implement 'loads' function: %r"
                            % serializer)
        if not hasattr(serializer, 'dumps'):
            raise TypeError("serializer '%s' does not implement 'dumps' function: %r"
                            % serializer)

        self.server = server
        self.spider = spider
        self.key = key % {'spider': spider.name}
        self.serializer = serializer

    def _encode_request(self, request):
        """Encode a request object"""
        obj = request_to_dict(request, self.spider)
        return self.serializer.dumps(obj)

    def _decode_request(self, encoded_request):
        """Decode an request previously encoded"""
        obj = self.serializer.loads(encoded_request)
        return request_from_dict(obj, self.spider)

    def __len__(self):
        """Return the length of the queue"""
        raise NotImplementedError

    def push(self, request):
        """Push a request"""
        raise NotImplementedError

    def pop(self, timeout=0):
        """Pop a request"""
        raise NotImplementedError

    def clear(self):
        """Clear queue/stack"""
        self.server.delete(self.key)


class SpiderQueue(Base):
    """Per-spider FIFO queue"""

    def __len__(self):
        """Return the length of the queue"""
        return self.server.llen(self.key)

    def push(self, request):
        """Push a request"""
        self.server.lpush(self.key, self._encode_request(request))

    def pop(self, timeout=0):
        """Pop a request"""
        if timeout > 0:
            data = self.server.brpop(self.key, timeout)
            if isinstance(data, tuple):
                data = data[1]
        else:
            data = self.server.rpop(self.key)
        if data:
            return self._decode_request(data)


class SpiderPriorityQueue(Base):
    """Per-spider priority queue abstraction using redis' sorted set"""

    def __len__(self):
        """Return the length of the queue"""
        return self.server.zcard(self.key)

    def push(self, request):
        """Push a request"""
        data = self._encode_request(request)
        score = -request.priority
        # We don't use zadd method as the order of arguments change depending on
        # whether the class is Redis or StrictRedis, and the option of using
        # kwargs only accepts strings, not bytes.
        self.server.execute_command('ZADD', self.key, score, data)

    def pop(self, timeout=0):
        """
        Pop a request
        timeout not support in this queue class
        """
        # use atomic range/remove using multi/exec
        pipe = self.server.pipeline()
        pipe.multi()
        pipe.zrange(self.key, 0, 0).zremrangebyrank(self.key, 0, 0)
        results, count = pipe.execute()
        if results:
            return self._decode_request(results[0])


class SpiderStack(Base):
    """Per-spider stack"""

    def __len__(self):
        """Return the length of the stack"""
        return self.server.llen(self.key)

    def push(self, request):
        """Push a request"""
        self.server.lpush(self.key, self._encode_request(request))

    def pop(self, timeout=0):
        """Pop a request"""
        if timeout > 0:
            data = self.server.blpop(self.key, timeout)
            if isinstance(data, tuple):
                data = data[1]
        else:
            data = self.server.lpop(self.key)

        if data:
            return self._decode_request(data)


__all__ = ['SpiderQueue', 'SpiderPriorityQueue', 'SpiderStack']
Published 291 original articles · won praise 94 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_35456045/article/details/104111433