scrapy中设置IP代理池(自定义IP代理池)

首先主要的就是你应该对scrapy目录结构,有一个相对清晰的认识,至少有过一个demo

一、手动更新IP池

1.在settings配置文件中新增IP池:

IPPOOL=[
    {"ipaddr":"61.129.70.131:8080"},
    {"ipaddr":"61.152.81.193:9100"},
    {"ipaddr":"120.204.85.29:3128"},
    {"ipaddr":"219.228.126.86:8123"},
    {"ipaddr":"61.152.81.193:9100"},
    {"ipaddr":"218.82.33.225:53853"},
    {"ipaddr":"223.167.190.17:42789"}
]

2.修改中间件文件middlewares.py

import random
from scrapy import signals
from myproxies.settings import IPPOOL

class MyproxiesSpiderMiddleware(object):

      def __init__(self,ip=''):
          self.ip=ip
       
      def process_request(self, request, spider):
          thisip=random.choice(IPPOOL)
          print("this is ip:"+thisip["ipaddr"])
          request.meta["proxy"]="http://"+thisip["ipaddr"]

3.在settings中设置DOWNLOADER_MIDDLEWARES

DOWNLOADER_MIDDLEWARES = {
#    'myproxies.middlewares.MyCustomDownloaderMiddleware': 543,
     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware':543,
     'myproxies.middlewares.MyproxiesSpiderMiddleware':125
}

猜你喜欢

转载自blog.csdn.net/AinUser/article/details/84934579