使用selenium爬取动态网页评论

爬取网站:http://www.santostang.com/2017/03/02/hello-world/

首先定位到frame:

通过Ctrl+Shift+C定位,并且搜索frame,定位框架所在位置:
这里写图片描述
找到HTML代码:

    < iframe
    title = "livere"
    scrolling = "no"
    src = "https://livere.me/comment/city?id=city&amp;refer=www.santostang.com%2F2017%2F03%2F02%2Fhello-world%2F&amp;uid=MTAyMC8yODU4My81MTU0&amp;site=http%3A%2F%2Fwww.santostang.com%2F2017%2F03%2F02%2Fhello-world%2F&amp;title=Hello%20world!%20-%20%E6%95%B0%E6%8D%AE%E7%A7%91%E5%AD%A6%40%E5%94%90%E6%9D%BESantos"
    style = "min-width: 100%; width: 100px; height: 6177px; overflow: hidden; border: 0px none; z-index: 124212;"
    id = "lv-comment-567"
    frameborder = "0" > < / iframe >

在selenium中我们通过指定iframetitle名来定位:

driver.switch_to.frame(driver.find_element_by_css_selector("iframe[title='livere']"))

然后定位每条评论的div

这里写图片描述
通过Ctrl+Shift+C定位,点击评论,找到div代码:

<div class="reply-content"><p>
                    哪里哪里在哪里?
                </p></div>

在selenium中通过查找对应的div找到评论:

comments = driver.find_elements_by_css_selector('div.reply-content')

可以看到找到的评论在<p></p>中。对每个评论遍历一遍:

for eachcomment in comments:
    content = eachcomment.find_element_by_tag_name('p')
    print (content.text)

查看运行结果:

这里写图片描述

猜你喜欢

转载自blog.csdn.net/TQCAI666/article/details/80172754