python reptile scrapy of how to perform multiple tasks at the same time crawling scrapy

background:

  At the beginning of learning scrapy crawler frame, just think if I perform the task on a crawler server, he said in the past. But I can not every reptile task on a new project now. For example, I established a task know almost crawling, but I crawl in this task, wrote more than spider, importantly, I want them to run at the same time, how do?

White solution:

  1, a new file run.py spiders in the same directory, as follows (the list of parameters which may be added last, as --nolog)

  2, White thought (then I), this is also OK, mygod, that I not write a few lines on the line on the line yet, the result (result idiot), white thought, that add a while loop, the reptiles write a list of names, so each cycle spiders get the name, the result was even worse.

  3, the following command is limited to a single spider or a project under the action of fast debugging crawl task.

1
2
3
from  scrapy.cmdline  import  execute
 
execute([ 'scrapy' , 'crawl' , 'httpbin' ])

  

By learning to know the original is like this:

  1, create any directory spiders at the same level, such as: commands

  2, in which to create crawlall.py file (where filename is the custom command)

  

 

 

 

  

crawlall.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from  scrapy.commands  import  ScrapyCommand
from  scrapy.utils.project  import  get_project_settings
 
 
class  Command(ScrapyCommand):
 
     requires_project  =  True
 
     def  syntax( self ):
         return  '[options]'
 
     def  short_desc( self ):
         return  'Runs all of the spiders'
 
     def  run( self , args, opts):
         spider_list  =  self .crawler_process.spiders. list ()
         for  name  in  spider_list:
             self .crawler_process.crawl(name,  * * opts.__dict__)
         self .crawler_process.start()

  3, not end here, settings.py also need to add a configuration file.

  COMMANDS_MODULE = 'project name. Directory name' 

COMMANDS_MODULE = 'zhihuuser.commands'

 

 

  4, then the question is, if I wrote more than crawling tasks spiders, so much I said above, I eventually need to how to perform, so easy! You can just put the following commands into Scheduled Tasks inside, on the line.

scrapy crawlall

Guess you like

Origin www.cnblogs.com/yunlongaimeng/p/11526466.html