Scrapy framework to improve crawling efficiency configuration

Scrapy framework to improve crawling efficiency configuration

- perform configuration in the configuration file can also set the default :( Setting)
 # 1 concurrent increase: 
default scrapy turned to 32 concurrent threads, may be appropriately increased. In the settings configuration file, modify CONCURRENT_REQUESTS = 100 to 100, and set the concurrency to 100.
# 2 Increase the log level: 
When running scrapy, there will be a lot of log information output, in order to reduce the CPU usage. You can set the log output information to INFO or ERROR. Write in the configuration file: LOG_LEVEL = 'INFO'
 # 3 Disable cookies: 
If cookies are not really needed, cookies can be disabled when scraping data to reduce CPU usage and improve crawling efficiency. Write in the configuration file: COOKIES_ENABLED = False
 # 4 prohibit retry: 
re-request (retry) for failed HTTP will slow down the crawling speed, so you can prohibit retry. Write in the configuration file: RETRY_ENABLED = False
 # 5 Reduce download timeout: 
If crawling a very slow link, reducing the download timeout can allow the stuck link to be quickly discarded, thereby improving efficiency. Write in the configuration file: DOWNLOAD_TIMEOUT = 10 The timeout time is 10s

 

Guess you like

Origin www.cnblogs.com/baohanblog/p/12686182.html