The use of scrapy framework middleware and scrapy-redis to realize distributed crawlers

1. Crawler middleware and download middleware

1. Download middleware
1, write it in middelwares.py, write a class
2, write the method in the class

process_request(self, request, spider):
    -返回 None,继续进入下一个中间件
    -返回 request对象,会进入引擎,被引擎放到调度器,等待下一次被调度执行
    -返回 response对象,会被引擎调度取spider中,解析数据
    -这里可以干什么事?
   	   -修改请求头
       -修改cookie
       -加代理
       -加selenium
              

Guess you like

Origin blog.csdn.net/BLee_0123/article/details/131630326