使用selenium爬取动态网页评论 - 代码天地

使用selenium爬取动态网页评论

编程语言 2018-05-02 22:03:30 阅读次数: 6

爬取网站：http://www.santostang.com/2017/03/02/hello-world/

首先定位到frame：

通过Ctrl+Shift+C定位，并且搜索frame，定位框架所在位置：
这里写图片描述
找到HTML代码：

    < iframe
    title = "livere"
    scrolling = "no"
    src = "https://livere.me/comment/city?id=city&amp;refer=www.santostang.com%2F2017%2F03%2F02%2Fhello-world%2F&amp;uid=MTAyMC8yODU4My81MTU0&amp;site=http%3A%2F%2Fwww.santostang.com%2F2017%2F03%2F02%2Fhello-world%2F&amp;title=Hello%20world!%20-%20%E6%95%B0%E6%8D%AE%E7%A7%91%E5%AD%A6%40%E5%94%90%E6%9D%BESantos"
    style = "min-width: 100%; width: 100px; height: 6177px; overflow: hidden; border: 0px none; z-index: 124212;"
    id = "lv-comment-567"
    frameborder = "0" > < / iframe >

在selenium中我们通过指定iframe的title名来定位:

driver.switch_to.frame(driver.find_element_by_css_selector("iframe[title='livere']"))

然后定位每条评论的div

这里写图片描述
通过Ctrl+Shift+C定位，点击评论，找到div代码：

<div class="reply-content"><p>
                    哪里哪里在哪里？
                </p></div>

在selenium中通过查找对应的div找到评论：

comments = driver.find_elements_by_css_selector('div.reply-content')

可以看到找到的评论在<p></p>中。对每个评论遍历一遍：

for eachcomment in comments:
    content = eachcomment.find_element_by_tag_name('p')
    print (content.text)

查看运行结果：

这里写图片描述

猜你喜欢

转载自blog.csdn.net/TQCAI666/article/details/80172754

使用selenium爬取动态网页评论

Selenium 爬取动态网页

爬取京东网页评论（动态网页）

动态网页爬取：使用Selenium和Pyppeteer处理动态加载内容

Scrapy配合Selenium和PhantomJS爬取动态网页

scrapy，selenium，PhantomJS爬取动态网页

python Selenium动态网页信息爬取

动态网页爬取

Python网络爬虫逆向分析爬取动态网页、使用Selenium库爬取动态网页、编辑将数据存储入MongoDB数据库

Python怎么爬取动态网页——如何使用selenium和PhantomJS

135 scrapy框架使用selenium爬取动态网页的数据, crawlspider

利用selenium并使用gevent爬取动态网页数据

Python 使用selenium+webdriver爬取动态网页内容

使用selenium和python，实现静态、多级、动态网页的信息爬取

Selenium使用PhantomJS来爬取动态网页时遇到的问题

Python使用爬虫ip爬取动态网页

Python开发爬虫之动态网页抓取篇：爬取博客评论数据——通过Selenium模拟浏览器抓取

Selenium + phantomJS 爬取动态网站

python爬虫使用selenium爬取动态网页信息——以智联招聘网站为例

python爬取动态网页的内容

Python爬取动态网页

网络爬虫：爬取动态网页

爬虫学习----动态网页爬取

常规动态网页爬取

Python爬虫（入门+进阶）学习笔记 1-8 使用自动化神器Selenium爬取动态网页（案例三：爬取淘宝商品）

win7环境scrapy集成selenium爬取动态网页

Python3+Selenium爬取动态网页数据

以"慕课网"为例使用nodeJS爬取动态网页中的动态数据

基于selenium+phantomJS的动态网站全站爬取

R语言爬取动态网页：使用RSelenium包和Rwebdriver包的前期准备

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

让自己的头脑极度开放

CentOS 6.5(x64) 和Redhat6.5操作系误删libc

高可用注册中心

【日记】12.28/【题解】AtCoder AGC041

XML（5）_XML 约束_DTD

Java集合Map（四）

树梅派安装桌面环境教程

pipenv 的使用和安装

小程序白屏问题和内存研究

C语言简单选择排序

每日归档

更多

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)