第2.5章 headless

phantomjs过时了,出来了headless, selenium之 chromedriver与chrome版本对应表
chromedriver Mirror
1 安装chromedriver

wget http://npm.taobao.org/mirrors/chromedriver/2.40/chromedriver_linux64.zip
unzip chromedriver_linux64.zip 
mv chromedriver  /usr/local/bin/chromedriver
chmod u+x,o+x   /usr/local/bin/chromedriver
# 检验是否正常使用:
chromedriver --version

2 安装chrome
不能只安装chromedriver,还需要安装chrome,否则会提示Message: unknown error: cannot find Chrome binary
chrome下载路径
执行下面的命令进行安装

 wget https://dl.lancdn.com/landian/software/chrome/m/67.0.3396.79_x86_64.rpm
 yum install 67.0.3396.79_x86_64.rpm

这里不用rpm -ivh 67.0.3396.79_x86_64.rpm,使用yum会自动找依赖。
3 headless的试用
有的文章里面参数前面加了--,比如--headless,但不要也可以。

  def getWebDriverHeadLess(self, options=webdriver.ChromeOptions(), timeout=360, types='http'):
        # tell selenium to use the dev channel version of chrome
        # NOTE: only do this if you have a good reason to
        # options.binary_location = '/usr/bin/google-chrome-unstable'  # path to google Chrome bin
        options.add_argument('headless')
        options.add_argument('no-sandbox')
        options.add_argument('window-size=1200x600')
        desired_capabilities = options.to_capabilities()
        if (types == 'http'):
            # 从代理服务获取ip
            proxyip = self.ipService.select_rand(types=types)
            if proxyip:
                proxy_url = str(proxyip['ip']) + ':' + str(proxyip['port'])
                proxy = Proxy({
                    'proxyType': ProxyType.MANUAL,
                    'httpProxy': proxy_url,
                })
                proxy.add_to_capabilities(desired_capabilities)
        elif (types == 'https'):
            # 从代理服务获取ip
            proxyip = self.ipService.select_rand(types=types)
            if proxyip:
                # with proxy
                proxy_url = str(proxyip['ip']) + ':' + str(proxyip['port'])
                proxy = Proxy({
                    'proxyType': ProxyType.MANUAL,
                    'sslProxy': proxy_url  # 需要信任代理服务器CA证书
                })
                proxy.add_to_capabilities(desired_capabilities)
        return webdriver.Chrome(chrome_options=options, desired_capabilities=desired_capabilities)

4 重启scrapyd
当环境发生变化的时候,需要重启scrapyd,scrapyd保存了一些旧的信息

kill -9 `ps -ef |grep scrapyd|awk '{print $2}' ` 
/etc/init.d/scrapyd start
scrapyd-deploy -p einfo
curl http://10.101.3.170:6800/schedule.json -d project=einfo -d spider=xxSpider

这里提一下phantomjs的配置,虽然将来也没啥用,主要还是环境变量的配置

tar -xjvf phantomjs-2.1.1-linux-x86_64.tar.bz2
ln -s /phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/bin/phantomjs
phantomjs -v

猜你喜欢

转载自blog.csdn.net/warrah/article/details/80780405