The means by which the website provider detects crawlers:
1. Check User-Agent
Response: Construct User-Agent and refer fields
2. Detect user behavior, as if an IP frequently logs in in a short period of time
Response: proxy IP, set sleep time
3. Dynamic pages
Coping: Selenium and phantomJS
In order to prevent being banned by the other party during crawling, let's implement the following in Scrapy:
1. Prohibit cookies
2. Set the download delay
3. Use IP pool
4. Use a User-Agent Pool
5. Distributed crawling