How to use scrapy crawler agent crawler

Second, deploy to the srcapy project

    1. Install scarpy-crawlera

    pip install, easy_install, whatever installation method you choose

pip install scrapy-crawlera

    2. Modify settings.py

        If you have set proxy IP before, please comment it out and join the crawler proxy

DOWNLOADER_MIDDLEWARES = {
    # 'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    # 'partent.middlewares.ProxyMiddleware': 100,
'scrapy_crawlera.CrawleraMiddleware': 600
}

    In order for crawlera to take effect, you need to add the API information you created (if you fill in the API key, fill in the blank string)

CRAWLERA_ENABLED = True
CRAWLERA_USER = '<API key>'
CRAWLERA_PASS = ''

    In order to achieve higher crawling efficiency, you can disable the Autothrottle extension and increase the maximum number of concurrent requests, and set the download timeout, the code is as follows

CONCURRENT_REQUESTS = 32
CONCURRENT_REQUESTS_PER_DOMAIN = 32
AUTOTHROTTLE_ENABLED = False
DOWNLOAD_TIMEOUT = 600

    If DOWNLOAD_DELAY is set in the code, it needs to be added in setting.py

CRAWLERA_PRESERVE_DELAY = T

Guess you like

Origin blog.csdn.net/chaishen10000/article/details/103253939