Python Crawler(3)Services
Local Machine Service
Start the Service
>scrapyd
Call to start the services
>curl http://localhost:6800/schedule.json -d project=default -d spider=author
{"status": "ok", "jobid": "3b9c84c28dae11e79ba4a45e60e77f99", "node_name": "ip-10-10-21-215.ec2.internal"}
More API
http://scrapyd.readthedocs.io/en/stable/api.html#api
Call to Pass a Parameter
>curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider -d setting=DOWNLOAD_DELAY=2 -d arg1=val1
List Projects
>curl http://localhost:6800/listprojects.json
{"status": "ok", "projects": ["default", "tutorial"], "node_name": "ip-10-10-21-215.ec2.internal”}
List Spiders
>curl http://localhost:6800/listspiders.json?project=default
{"status": "ok", "spiders": ["author", "quotes"], "node_name": "ip-10-10-21-215.ec2.internal"}
UI of Status
http://localhost:6800/
http://scrapyd.readthedocs.io/en/stable/overview.html
Clustered Solution ?
https://github.com/rmax/scrapy-redis
References:
http://scrapyd.readthedocs.io/en/stable/overview.html#how-scrapyd-works
Python Crawler(3)Services
猜你喜欢
转载自sillycat.iteye.com/blog/2391685
今日推荐
周排行