background:
At the beginning of learning scrapy crawler frame, just think if I perform the task on a crawler server, he said in the past. But I can not every reptile task on a new project now. For example, I established a task know almost crawling, but I crawl in this task, wrote more than spider, importantly, I want them to run at the same time, how do?
White solution:
1, a new file run.py spiders in the same directory, as follows (the list of parameters which may be added last, as --nolog)
2, White thought (then I), this is also OK, mygod, that I not write a few lines on the line on the line yet, the result (result idiot), white thought, that add a while loop, the reptiles write a list of names, so each cycle spiders get the name, the result was even worse.
3, the following command is limited to a single spider or a project under the action of fast debugging crawl task.
1
2
3
|
from
scrapy.cmdline
import
execute
execute([
'scrapy'
,
'crawl'
,
'httpbin'
])
|
By learning to know the original is like this:
1, create any directory spiders at the same level, such as: commands
2, in which to create crawlall.py file (where filename is the custom command)
crawlall.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
from
scrapy.commands
import
ScrapyCommand
from
scrapy.utils.project
import
get_project_settings
class
Command(ScrapyCommand):
requires_project
=
True
def
syntax(
self
):
return
'[options]'
def
short_desc(
self
):
return
'Runs all of the spiders'
def
run(
self
, args, opts):
spider_list
=
self
.crawler_process.spiders.
list
()
for
name
in
spider_list:
self
.crawler_process.crawl(name,
*
*
opts.__dict__)
self
.crawler_process.start()
|
3, not end here, settings.py also need to add a configuration file.
COMMANDS_MODULE = 'project name. Directory name'
COMMANDS_MODULE = 'zhihuuser.commands'
4, then the question is, if I wrote more than crawling tasks spiders, so much I said above, I eventually need to how to perform, so easy! You can just put the following commands into Scheduled Tasks inside, on the line.
scrapy crawlall