python爬虫scrapy之如何同时执行多个scrapy爬行任务

背景：

　　刚开始学习scrapy爬虫框架的时候，就在想如果我在服务器上执行一个爬虫任务的话，还说的过去。但是我不能每个爬虫任务就新建一个项目吧。例如我建立了一个知乎的爬行任务，但是我在这个爬行任务中，写了多个spider，重要的是我想让他们同时运行，怎么办？

小白解决办法：

　　1、在spiders同目录下新建一个run.py文件，内容如下（列表里面最后可以加上参数，如--nolog）

　　2、小白想了（当时的我），这样也行，mygod，那我岂不是多写几行就行就行了么，结果（结果白痴了），小白又想，那加个while循环，把爬虫名字都写入一个列表，这样循环拿到每个spiders的name，结果更惨。

　　3、下面命令只限于，快速调试的作用或一个项目下单个spider的爬行任务。

 
         from  
         scrapy.cmdline  
         import  
         execute 
        
         execute([ 
         'scrapy' 
         , 
         'crawl' 
         , 
         'httpbin' 
         ])

通过学习才知道原来是这样子：

　　1、在spiders同级创建任意目录，如：commands

　　2、在其中创建 crawlall.py 文件（此处文件名就是自定义的命令）

crawlall.py

 
         from  
         scrapy.commands  
         import  
         ScrapyCommand 
        
         from  
         scrapy.utils.project  
         import  
         get_project_settings 
        
         class  
         Command(ScrapyCommand): 
        
         requires_project  
         =  
         True 
        
         def  
         syntax( 
         self 
         ): 
        
         return  
         '[options]' 
        
         def  
         short_desc( 
         self 
         ): 
        
         return  
         'Runs all of the spiders' 
        
         def  
         run( 
         self 
         , args, opts): 
        
         spider_list  
         =  
         self 
         .crawler_process.spiders. 
         list 
         () 
        
         for  
         name  
         in  
         spider_list: 
        
         self 
         .crawler_process.crawl(name,  
         * 
         * 
         opts.__dict__) 
        
         self 
         .crawler_process.start()

　　3、到这里还没完，settings.py配置文件还需要加一条。

　　COMMANDS_MODULE = ‘项目名称.目录名称’　

COMMANDS_MODULE = 'zhihuuser.commands'

　　4、那么问题来了，如果我在spiders写了多个爬行任务，我上面说了这么多，我最终需要怎么执行，so easy！你可以直接把下面这个命令放到计划任务里面，就行了。

scrapy crawlall

python爬虫scrapy之如何同时执行多个scrapy爬行任务

猜你喜欢