scrapy crawling efficiency lifting arrangement

Increase concurrency:
  default scrapy turned to 32 concurrent threads, may be appropriately increased. Modify settings in the configuration file CONCURRENT_REQUESTS = 100 is 100, 100 to become complicated set.

Reduce log level:
  When you run scrapy, there will be a lot of log output information, in order to reduce CPU usage. Log output information may be provided or INFO to ERROR. Written in the configuration file: , LOG_LEVEL, = 'the INFO'

Prohibit cookie:
  If the cookie is not really needed, at the time of scrapy crawling can disable cookie data to reduce CPU usage, improve crawl efficiency. Written in the configuration file: COOKIES_ENABLED = False

Retry prohibition:
  for re failed HTTP request (retry) will slow crawling speed, retry can be prohibited. Written in the configuration file: RETRY_ENABLED = False

Reduce the download time-out:
  If a very slow crawling links, reduce the download time-out can make the jammed fast link was abandoned, thereby enhancing efficiency. Be written in the configuration file: DOWNLOAD_TIMEOUT = 10 timeout to 10s

Guess you like

Origin www.cnblogs.com/open-yang/p/11330108.html