爬虫之爬取斗鱼官网LOL部分主播的状态 - 代码天地

爬虫之爬取斗鱼官网LOL部分主播的状态

其他 2018-09-29 09:10:51 阅读次数: 0

一个爬虫小程序爬取主播的排名及观看人数

import re
import requests
import request
class Spider():
    url = 'https://www.douyu.com/g_lol'
    root_pattern = '<p>([\s\S]*?)</p>'
    name_pattern = '<span class="dy-name ellipsis fl">([\s\S]*?)</span>'
    number_pattern = '<span class="dy-num fr"  >([\s\S]*?)</span>'

    def __fetch_content(self):
        r = requests.get(Spider.url)
        htmls = r.text
        return htmls

    def __analysis(self, htmls):
        root_htmls = re.findall(Spider.root_pattern, htmls)
        anchors = []
        for html in root_htmls:
            name = re.findall(Spider.name_pattern, html)
            number = re.findall(Spider.number_pattern, html)
            anchor = {'name': name, 'number': number}
            anchors.append(anchor)
        return anchors

    def __refine(self, anchors):
        l = lambda anchor: {
            'name': anchor['name'][0],
            'number': anchor['number'][0]
            }
        return map(l, anchors)

    def __sort(self, anchors):
        anchors = sorted(anchors, key=self.__sort_seed, reverse=True)
        return anchors

    def __sort_seed(self, anchor):
        r = re.findall('\d*', anchor['number'])
        number = float(r[0])
        if '万' in anchor['number']:
            number *= 10000
        return number

    def __show(self, anchors):
        for rank in range(0, len(anchors)):
            print(
                '人数排名' + str(rank + 1)
                + ' : ' + anchors[rank]['name']
                + '~~~~~~' + anchors[rank]['number']
            )

    def go(self):
        htmls = self.__fetch_content()
        anchors = self.__analysis(htmls)
        anchors = list(self.__refine(anchors))
        anchors = self.__sort(anchors)
        self.__show(anchors)

spider = Spider()
spider.go()

运行结果：

喜欢的朋友们可以去看主播的排名啦

猜你喜欢

转载自www.cnblogs.com/yuxuanlian/p/9721807.html

爬虫之爬取斗鱼官网LOL部分主播的状态

爬取斗鱼LOL主播人气数据，并显示排行榜 [网络爬虫] [应用案例][请求头][模块]

斗鱼爬虫，爬取颜值频道的主播图片和名字

爬取斗鱼主播名字和热度

Python爬虫实战：基于Scrapy的爬取斗鱼颜值主播图片并下载到本地2.0版

Scrapy实践：爬取斗鱼TV主播的头像（重写ImagesPipeline实现图片爬取）

使用scrapy爬取手机版斗鱼主播的房间图片及昵称

多线程Beatiful Soup爬取斗鱼所有在线主播的信息

python实战之原生爬虫(爬取熊猫主播排行榜)

Python爬虫：爬取虎牙星秀主播图片

移动直播之网红主播怎样将直播内容推到斗鱼直播平台的方案

如何用Python爬取LOL官网全英雄皮肤

python3_scrapy爬取斗鱼手机端数据包（颜值模块主播照片）

python爬虫爬取斗鱼直播数据

Python爬虫：简易的爬取斗鱼弹幕

入坑爬虫之爬取王者荣耀官网英雄皮肤

python学习笔记之网络爬虫(七)爬取官网信息标题

python3爬虫系列之使用requests爬取LOL英雄图片

Python爬虫实战：基于Scrapy爬取虎牙星秀主播图片并下载到本地

python爬虫入门练习，使用正则表达式和requests爬取LOL官网皮肤

selenium爬虫报错：Message: stale element reference: element is not attached to the page document 促成1分钟爬完斗鱼主播信息。

爬取lol皮肤

爬取斗鱼图片

爬取斗鱼平台

Python爬虫爬取LOL所有英雄皮肤

Python ---- 爬虫爬取LOL英雄皮肤图片

Python爬虫-爬取斗鱼网页selenium+bs

爬虫项目4[爬取斗鱼直播数据]

利用python爬虫爬取斗鱼图片(简单详细)

《原生爬虫》爬取某直播平台某分类下的主播人气，生成排行榜

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

周排行

Python环境安装与基础语法（1）——计算机基础知识

IMU预积分

ADAS中的LDW、FCW、BSD、LCA、ACC、AEB、APA、DMS代表的含义

B站笔试两道题

skyeye arm 硬件虚拟机环境的搭建

Web前端静态页面示例

数组-合并排序数组 II-简单

springcloud之版本问题启动报错

面向对象-------------匿名对象(六)

输入URL到页面呈现中间发生了什么？

每日归档

更多

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)