selenium+beautifulsoup模拟翻页 - 代码天地

selenium+beautifulsoup模拟翻页

其他 2018-10-19 10:21:13 阅读次数: 0

#coding=utf-8

import unittest
from selenium import webdriver
from bs4 import BeautifulSoup

class douyuSelenium(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.PhantomJS()

    def testDouyu(self):
        self.driver.get('http://www.douyu.com/directory/all')
        while True:
            #print self.driver.page_source
            soup = BeautifulSoup(self.driver.page_source, "html.parser")

            titles = soup.select('h3.ellipsis')
            nums = soup.select('span.dy-num.fr')

            for title, num in zip(nums, titles):
                print u"观众人数："+ title.get_text().strip(), u"\t房间标题:"+num.get_text().strip()

            if self.driver.page_source.find('shark-pager-disable-next') != -1:
                break

            self.driver.find_element_by_class_name('shark-pager-next').click()

    def tearDown(self):
        print 'finish load ...'
        self.driver.quit()

if __name__ == '__main__':
    unittest.main()

scrapy 模拟登录

# -*- coding: utf-8 -*-
import scrapy



class Renren2Spider(scrapy.Spider):
    name = "renren2"
    allowed_domains = ["renren.com"]
    start_urls = (
        "http://www.renren.com/PLogin.do",
    )

    # 处理start_urls里的登录url的响应内容，提取登陆需要的参数（如果需要的话)
    def parse(self, response):
        # 提取登陆需要的参数
        #_xsrf = response.xpath("//_xsrf").extract()[0]

        # 发送请求参数，并调用指定回调函数处理
        yield scrapy.FormRequest.from_response(
                response,
                formdata = {"email" : "[email protected]", "password" : "axxxxxxxe"},#, "_xsrf" = _xsrf},
                callback = self.parse_page
            )

    # 获取登录成功状态，访问需要登录后才能访问的页面
    def parse_page(self, response):
        url = "http://www.renren.com/422167102/profile"
        yield scrapy.Request(url, callback = self.parse_newpage)

    # 处理响应内容
    def parse_newpage(self, response):
        with open("xiao.html", "w") as filename:
            filename.write(response.body)

猜你喜欢

转载自blog.csdn.net/sf131097/article/details/82869608

selenium+beautifulsoup模拟翻页

使用selenium+BeautifulSoup 抓取京东商城手机信息

Python利用selenium+Beautifulsoup破解动态class/id并提取相应文本的方法

Python爬虫实战：Selenium+BeautifulSoup实现对京东商品完整数据的爬取

python爬虫之selenium+BeautifulSoup库，爬取搜索内容并保存excel

Python爬虫：Selenium+BeautifulSoup解析动态HTML页面【附完整代码】

用selenium爬取、BeautifulSoup解析拉勾网职位信息，oop模式，带自动翻页

python selenium 执行翻页

selenium:css 定位与网页翻页功能

python-webbrowser/requests/BeautifulSoup/selenium

python + selenium + BeautifulSoup 爬股票信息

python爬虫之requests+selenium+BeautifulSoup

Day03:Selenium,BeautifulSoup4

Web Scraping指南: 使用Selenium和BeautifulSoup

JavaScript模拟手势翻页动作

翻页

selenium - webdriver 控制浏览器滚动条（翻页）

利用selenium实现自动翻页爬取某鱼数据

selenium获取网页翻页表格内容并存入 excel

selenium模拟登录京东

python + selenium 模拟键盘

selenium模拟事件处理

selenium模拟使用

selenium模拟登录

selenium模拟鼠标操作

selenium-模拟鼠标

selenium模拟鼠标滚动

beautifulSoup

python+BeautifulSoup+selenium+mysqldb完成数据抓取

selenium+BeautifulSoup+request：抓取博客园我的粉丝

今日推荐

数学建模Matlab之数据预处理方法

充电桩---ISO15118协议详细介绍

对话Kaldi之父、小米首席语音科学家Daniel Povey：开源环境比金钱和荣誉更吸引我 | AGI技术50人...

Hugging Face全攻略：轻松下载Llama 3模型，探索NLP的无限可能！【实操】

阅读送书抽奖？玩转抽奖游戏，js-tool-big-box工具库新上抽奖功能

百度发布Comate代码知识增强2.0，国内首个支持实时检索智能代码助手

黑客利用扫雷游戏 Python 克隆隐藏恶意脚本，攻击欧洲和美国金融机构

微软对开源字体 Cascadia Code 进行重大更新

好书推荐《ChatGPT原理与架构：大模型的预训练、迁移和中间件编程》

Baidu Comate 智能编码助手：编程新伙伴，效率新飞跃

AI时代：人工智能大模型引领科技创造新时代

百篇博客 · 千里之行

周排行

Python模块之shelve

勇于承担责任

Hikyuu 1.1.0 发布，量化交易研究框架

字节跳动Java3面“凉凉”~不负韶华，努力复习备战“金三银四”

Linux下静态链接库与动态链接库的区别

spring boot架构改造

怎么理解AOP

文件不同步 --本地和eclipse

在linux配置nginx负载均衡

Linux Shell基础命令

每日归档

更多

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)

2024-05-23(9)

2024-05-22(41)

2024-05-21(8)

2024-05-20(36)

2024-05-19(0)