Python之Scrapy爬虫实战--绕过网站的反爬

设置随机UA

修改middlewares.py

from fake_useragent import UserAgent

class RandomUserAgentMiddleware(object):
    def process_request(self, request, spider):
        ua = UserAgent()
        request.headers['User-Agent'] = ua.random

修改settings.py

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
   'scrapy_test.middlewares.RandomUserAgentMiddleware': 543,
}

设置IP代理

测试网站:http://icanhazip.com,网站可以返回当前请求的ip地址。

添加referer

default_headers = {
            'referer': 'https://www.baidu.com/',
        }
发布了21 篇原创文章 · 获赞 32 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/sinat_37676560/article/details/104204020