scrapy-redis module source code analysis

/scrapy_redis 
----|__init__.py

----|connection.py
--------function:get_redis_from_settings
puts all configurations into dict:params for instantiating redis objects
----- ---function:get_redis
instantiates the redis object, the from_url method takes precedence

----|defaults.py
default connection parameters

----|dupefilter.py dedupe
class
-------class:RFPDupeFilter(BaseDupeFilter)
------------method:__init__(self,server,key,debug)
------------method:clear(self)
clear request_fingerprint record
----- -------method:close(self,reason)
When the crawler ends, call the
------------method:from_crawler(cls,crawler)
class method, and call the from_settings method to
initialize- -------------method:from_settings(cls,settings)
Class method initialization, reserved hook
------------method:log(self,request,spider)
According to the incoming debug or default display log
------------ method:request_fingerprint(self,request)
returns the value calculated by sha1 according to the request information
------------method:request_seen(self,request)
calls request_fingerprint to determine whether it has been visited

----| picklecompat.py
pickle serialization and deserialization

----|piplines.py
persistence
class-------class:RedisPipline
------------method:process_item
calls _process_item
------------method: _process_item
is put into the redis list by the rpush method

----|queue.py
uses the queue made by redis, there are three kinds of Fifo, Lifo, PriorityQueue

---- |scheduler.pyScheduler
class
-------class:Scheduler
------------method:enqueue_request
is not whitelisted and visited, put the Request object in the queue ------------method :
next_request
takes the value from the queue-

---|spiders.py
crawler

----|utils.py
--------bytes_to_str
bytes into strings, default utf-8


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325208771&siteId=291194637