Recap:
A: the picture lazy loading (lazy loading face picture how to do)
--- set the picture loaded with selenium location
--- analysis lazy loaded property, direct access
two:
How to improve crawling efficiency scrapy
Increase concurrency:
default scrapy turned to 32 concurrent threads, may be appropriately increased. Modify CONCURRENT_REQUESTS = 100 settings value 100 in the configuration file, and transmits the set to become 100.
Reduce log level:
When you run scrapy, there will be a lot of log output information, in order to reduce CPU usage. Log output information may be provided or INFO to ERROR. Written in the configuration file: LOG_LEVEL = 'INFO'
ban cookie:
If the cookie is not really needed, at the time of scrapy crawling can disable cookie data to reduce CPU usage, improve crawl efficiency. Written in the configuration file: COOKIES_ENABLED = False
prohibited Retry:
for re failed HTTP request (retry) will slow crawling speed, retry can be prohibited. Written in the configuration file: RETRY_ENABLED = False
reduce the download time-out:
If a very slow crawling links, reduce the download time-out can make the jammed fast link was abandoned, thereby enhancing efficiency. Be written in the configuration file: DOWNLOAD_TIMEOUT = 10 timeout to 10s
Three: crawlSpider the station crawling