Scrapy improve crawling efficiency

Increase concurrency:

The default number of threads is 32 scrapy opening, can be appropriately adding, modifying councurrent_requests seeting set in the configuration file 100

Reduce log level:

In operation scrapy run, there will be a large number of output log, in order to reduce cpu utilization, the log level settings can be set or error log output level is info

Disabling cookie:

If not really need to use a cookie, you can disable the cookie, crawling enhance efficiency. cookie_enabled = false

Ban again:

HTTP request failed to re-request will slow down the crawling speed, can prohibit try again. retry_enabled = false

Reduce download times out:

Link to a very slow crawling will be very slow, reducing download times out, such a link would be abandoned, enhance the efficiency of crawling. download_timeout = 10 timeout 10S

Guess you like

Origin www.cnblogs.com/wen-kang/p/10972806.html