python 爬取喜马拉雅 - 代码天地

python 爬取喜马拉雅

其他 2020-03-22 16:18:46 阅读次数: 0

import re

import requests


class SpiderHimalaya(object):
    def __init__(self):
        self.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50"}
        self.audio_url = ''
    def get_page_url(self):
        """每一页的url"""
        pageUrl= ""
        pageUrlList=[pageUrl.format(i) for i in range(1,13)]
        return pageUrlList
    def get_response(self,url):
        """获取响应"""
        resp = requests.get(url,headers=self.headers)
        if resp.status_code == 200:
            return resp
        else:
            print(resp)
    def get_item_id(self):
        """获取每一节的id"""
        pageUrlList=self.get_page_url()
        resp = self.get_response(url=pageUrlList[0])
        content_list = resp.json()['data']['tracks']
        item_list = []
        for con in content_list:
            item={}
            key = self.audio_url.format(con['trackId'])
            item[key] = con["title"]
            item_list.append(item)
        return item_list
    def down_mp3(self,item):
        """下载音频"""
        (ite,) = item.items() # 拆包,返回一个元祖
        url,name = ite # 元组拆包
        resp=self.get_response(url)
        file_name = (lambda :''.join((lambda :re.split(r"[/ \\ : * \" < > | ？]+",name))()))() # 文件名处理
        print(file_name)
        mp3_url = resp.json()['data']['src']
        mp3_content = self.get_response(mp3_url).content
        with open(''.join(['三国志/',file_name,'.mp3']),'wb') as f:
            f.write(mp3_content)
    def run(self):
        """主函数"""
        item_list=self.get_item_id()
        for item in item_list:
            self.down_mp3(item)

if __name__ == '__main__':
    SpiderHimalaya().run()

go_flush

发布了127 篇原创文章 · 获赞 25 · 访问量 3万+

私信关注

猜你喜欢

转载自blog.csdn.net/weixin_44224529/article/details/104836401

python 爬取喜马拉雅

practice之Python爬取喜马拉雅的音频

Python---喜马拉雅fm的音频爬取

如何用Python爬取喜马拉雅全网音频文件

Python爬虫--喜马拉雅三国音频爬取

Python实例---爬取喜马拉雅全网音频文件

教你用python爬取喜马拉雅FM音频，干货分享~

Python爬取喜马拉雅有声小说【转载】

Python中使用requests和parsel爬取喜马拉雅电台音频

Python爬取喜马拉雅有声书

【python爬虫】对喜马拉雅上一个专辑的音频进行爬取并保存到本地

【Python3 爬虫学习笔记】爬取喜马拉雅《宝宝巴士-奇妙三字经》

python爬取喜马拉雅FM雪中悍刀行整本有声小说~

Python 爬取喜马拉雅音频

Python爬虫--喜马拉雅音频爬取

Python爬虫|爬取喜马拉雅音频

喜马拉雅爬取

python爬虫-喜马拉雅_晚安妈妈睡前故事

Python爬虫 -- 喜马拉雅爬虫01

[python爬虫]多进程爬取喜马拉雅音乐

喜马拉雅说爬取音乐文件

类+进程池的方法爬取喜马拉雅

喜马拉雅全站音频爬取

python django打造自己的喜马拉雅 3（主页前端+数据库）

Python采集喜马拉雅的音频，随时随地,听我想听

python爬虫80行代码拿下喜马拉雅有声书

Python3简单爬虫之下载相关类型音乐（喜马拉雅网站）！

喜马拉雅

[python爬虫]喜马拉雅音乐

python下载想听的有声书，让喜马拉雅收费，我是程序员！

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)