Python Crawler(3)Services

Python Crawler(3)Services

Local Machine Service
Start the Service
>scrapyd

Call to start the services
>curl http://localhost:6800/schedule.json -d project=default -d spider=author
{"status": "ok", "jobid": "3b9c84c28dae11e79ba4a45e60e77f99", "node_name": "ip-10-10-21-215.ec2.internal"}

More API
http://scrapyd.readthedocs.io/en/stable/api.html#api

Call to Pass a Parameter
>curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider -d setting=DOWNLOAD_DELAY=2 -d arg1=val1

List Projects
>curl http://localhost:6800/listprojects.json
{"status": "ok", "projects": ["default", "tutorial"], "node_name": "ip-10-10-21-215.ec2.internal”}

List Spiders
>curl http://localhost:6800/listspiders.json?project=default
{"status": "ok", "spiders": ["author", "quotes"], "node_name": "ip-10-10-21-215.ec2.internal"}

UI of Status
http://localhost:6800/

http://scrapyd.readthedocs.io/en/stable/overview.html

Clustered Solution ?
https://github.com/rmax/scrapy-redis


References:
http://scrapyd.readthedocs.io/en/stable/overview.html#how-scrapyd-works

猜你喜欢

转载自sillycat.iteye.com/blog/2391685