Increase concurrency:
default scrapy turned to 32 concurrent threads, may be appropriately increased. Modify settings in the configuration file CONCURRENT_REQUESTS = 100 is 100, 100 to become complicated set.Reduce log level:
When you run scrapy, there will be a lot of log output information, in order to reduce CPU usage. Log output information may be provided or INFO to ERROR. Written in the configuration file: , LOG_LEVEL, = 'the INFO'Prohibit cookie:
If the cookie is not really needed, at the time of scrapy crawling can disable cookie data to reduce CPU usage, improve crawl efficiency. Written in the configuration file: COOKIES_ENABLED = FalseRetry prohibition:
for re failed HTTP request (retry) will slow crawling speed, retry can be prohibited. Written in the configuration file: RETRY_ENABLED = FalseReduce the download time-out:
If a very slow crawling links, reduce the download time-out can make the jammed fast link was abandoned, thereby enhancing efficiency. Be written in the configuration file: DOWNLOAD_TIMEOUT = 10 timeout to 10s
scrapy crawling efficiency lifting arrangement
Guess you like
Origin www.cnblogs.com/open-yang/p/11330108.html
Recommended
Ranking