scrapy使用PhantomJS和selenium爬取数据 - 代码天地

scrapy使用PhantomJS和selenium爬取数据

其他 2018-07-30 18:47:03 阅读次数: 0

1.phantomjs 安装

下载：http://phantomjs.org/download.html

解压：

tar  -jxvf  phantomjs-2.1.1-linux-x86_64.tar.bz2

重命名：

mv /usr/local/phantomjs-2.1.1-linux-x86_64/ /usr/local/phantomjs

软连接：

ln -s /usr/local/phantomjs/bin/phantomjs /usr/bin/

[root@izuf622gt8apcfsz7i1mqdz /]# phantomjs
phantomjs>

2.selenium 安装

pip 安装： pip install selenium

使用：

    def process_request(self, request, spider):
        driver = webdriver.PhantomJS()
        # driver = webdriver.Chrome()
        driver.get(request.url)
        body = driver.page_source
        input_first  = driver.find_element_by_id('stockID_')
        input_first.clear()

        input_first.send_keys('000150')

        button = driver.find_element_by_id('button')
        dataClick = button.click()
        print(dataClick)
        body = driver.page_source
        # driver.switch_to.frame('i_nr')
        # print("访问：", driver.page_source)
        return HtmlResponse(driver.current_url, body=body, encoding='utf-8')

猜你喜欢

转载自www.cnblogs.com/myvic/p/9392079.html

scrapy使用PhantomJS和selenium爬取数据

Scrapy配合Selenium和PhantomJS爬取动态网页

scrapy，selenium，PhantomJS爬取动态网页

Python爬虫实战使用scrapy与selenium来爬取数据

Phantomjs与Selenium爬取图片

python3 scrapy 使用PhantomJS作为middlewares爬取动态加载的数据

在scrapy中使用phantomJS实现异步爬取

使用selenium和phantomjs解决爬虫中对渲染页面的爬取

Python怎么爬取动态网页——如何使用selenium和PhantomJS

案例_使用Selenium与PhantomJS爬取斗鱼房间信息

Selenium + phantomJS 爬取动态网站

Selenium+PhantomJs 爬取网页内容

selenium+PhantomJS爬取（豆瓣读书）

selenium + phantomJS 爬取（豆瓣读书）

selenium+phantomjs爬取bilibili

selenium和PhantomJS的使用

爬取动态js html数据方法二使用phantomjs

python 使用selenium和requests爬取页面数据

135 scrapy框架使用selenium爬取动态网页的数据, crawlspider

Scrapy 框架使用 selenium 爬取动态加载内容

使用scrapy爬取

认识selenium+phantomjs爬取大多数网站数据基本原理及应用

Python自动化（一）使用Selenium+PhantomJS爬取电影下载链接

Selenium使用PhantomJS来爬取动态网页时遇到的问题

使用selenium爬取网站动态数据

使用selenium爬取斗鱼直播数据

selenium使用代理爬取数据

使用PhantomJS爬取股票信息

基于selenium+phantomJS的动态网站全站爬取

爬虫使用selenium和PhantomJS获取动态数据

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)