/scrapy_redis
----|__init__.py
----|connection.py
--------function:get_redis_from_settings
puts all configurations into dict:params for instantiating redis objects
----- ---function:get_redis
instantiates the redis object, the from_url method takes precedence
----|defaults.py
default connection parameters
----|dupefilter.py dedupe
class
-------class:RFPDupeFilter(BaseDupeFilter)
------------method:__init__(self,server,key,debug)
------------method:clear(self)
clear request_fingerprint record
----- -------method:close(self,reason)
When the crawler ends, call the
------------method:from_crawler(cls,crawler)
class method, and call the from_settings method to
initialize- -------------method:from_settings(cls,settings)
Class method initialization, reserved hook
------------method:log(self,request,spider)
According to the incoming debug or default display log
------------ method:request_fingerprint(self,request)
returns the value calculated by sha1 according to the request information
------------method:request_seen(self,request)
calls request_fingerprint to determine whether it has been visited
----| picklecompat.py
pickle serialization and deserialization
----|piplines.py
persistence
class-------class:RedisPipline
------------method:process_item
calls _process_item
------------method: _process_item
is put into the redis list by the rpush method
----|queue.py
uses the queue made by redis, there are three kinds of Fifo, Lifo, PriorityQueue
---- |scheduler.pyScheduler
class
-------class:Scheduler
------------method:enqueue_request
is not whitelisted and visited, put the Request object in the queue ------------method :
next_request
takes the value from the queue-
---|spiders.py
crawler
----|utils.py
--------bytes_to_str
bytes into strings, default utf-8
scrapy-redis module source code analysis
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=325208771&siteId=291194637
Recommended
Ranking