如何用you-get 下载bilibili 视频以及字幕

版本

2020-08-11: init 本文的工具都是这个时候的 可能后面有些支持.

概述

最近看到这个视频课程: 【吴恩达团队Tensorflow2.0实践系列课程第一课】TensorFlow2.0中基于TensorFlow2.0的人工智能、机器学习和深度学习简 就想下载到本地 防止被和谐… 所以就使用you-get 下载. you-get 这个 repo clone下来就可以了: 官方repo. 想着以前也提交过代码 再看看也没啥大问题.

使用下载

使用如下命令:

you-get --debug --playlist https://www.bilibili.com/video/BV1zE411T7nb

这个下载没有问题, 但是我发现中文字幕没有了.

所以继续研究了下 如何下载

  1. 观察视频, 发现字幕是可以拖动的, 于是加上firefox 断点
    在这里插入图片描述
  2. 加上断点后拦截请求, 发现有一个请求会初始化对应的subtitles

在这里插入图片描述像上面这个复制为curl 就可以测试:

curl 'https://api.bilibili.com/x/player.so?id=cid%3A162260003&aid=95051759&bvid=BV1zE411T7nb&buvid=FB2BB46F-B1F3-4BDA-A589-333489Q4e0411A155830infoc' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0' -H 'Accept: */*' -H 'Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2' --compressed -H 'Origin: https://www.bilibili.com' -H 'Referer: https://www.bilibili.com/video/BV1zE411T7nb' -H 'Connection: keep-alive' -H $'Cookie: _uuid=3FFA2652-F830-7C37-F9A4-333489Q4e0411A155830infoc; buvid3=FB2BB46F-B1F3-4BDA-A589-33348940411A155830infoc; sid=cejpsw6m; CURRENT_FNVAL=16; LIVE_BUVID=AUTO9515820831073003; rpdid=|(k)~RY~mkk|0J\'ul)k|)Juuk; im_notify_type_11615329=0; DedeUserID=11615329; DedeUserID__ckMd5=7c197013cd07c4b6; SESSDATA=b2ce8c5b%2C1600861501%2Ca9549*31; bili_jct=b7000d5d160ed086c798d55808a55f75; PVID=2; CURRENT_QUALITY=80; bsource=search_google; flash_player_gray=false; html5_player_gray=false; bfe_id=6f285c892d9d3c1f8f020adad8bed553' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'TE: Trailers'

这里面需要cid, aid, bvid和 buvid. 只有最后这个buvid不知道是干啥的, 折腾了很久 发现 这个是cookie 中的uuid. 然后经过一些测试后 发现该请求 还需要一个Reffer 不然会报错-412 (应该就是412 估计是b站自己定义的错误码, 412 意味着前置条件未满足):
在这里插入图片描述所以获取该url的最简单地请求是:

https://api.bilibili.com/x/player.so?id=cid%3A162260003&aid=95051759&bvid=BV1zE411T7nb&buvid=FB2BB46F-B1F3-4BDA-A589-333489Q4e0411A155830infoc

with header -H 'Referer: https://www.bilibili.com/video/BV1zE411T7nb'  后面这个是bv号
  1. 解析上面的xml输出
<ip>110.184.137.149</ip>
<zoneid>4538384</zoneid>
<zoneip></zoneip>
<country>中国</country>
<login>true</login>
<time>1597112522</time>
<name>scugxl</name>
<user>11615329</user>
<user_hash>3f9ed8c9</user_hash>
<money>681.20</money>
<face>//i0.hdslb.com/bfs/face/member/noface.jpg</face>
<isadmin>false</isadmin>
<permission>10000,1001</permission>
<level></level>
<level_info>{"current_level":4,"current_min":4500,"current_exp":8127,"next_exp":10800}</level_info>
<answer_status>0</answer_status>
<vip>{"vipType":0,"vipDueDate":0,"dueRemark":"","accessStatus":0,"vipStatus":0,"vipStatusWarn":""}</vip>
<official_verify>{"type":-1,"desc":""}</official_verify>
<block_time>0</block_time>
<lastplaytime>18000</lastplaytime>
<lastcid>162260003</lastcid>
<aid>95051759</aid>
<bvid>BV1zE411T7nb</bvid>
<typeid>201</typeid>
<vtype>vupload</vtype>
<oriurl></oriurl>
<suggest_comment>false</suggest_comment>
<server>chat.bilibili.com</server>
<maxlimit>1000</maxlimit>
<chatid>162260003</chatid>
<pid>1</pid>
<duration>75:37</duration>
<arctype>Original</arctype>
<allow_bp>false</allow_bp>
<bottom>0</bottom>
<shot>false</shot>
<sinapi>1</sinapi>
<acceptguest>false</acceptguest>
<acceptaccel>false</acceptaccel>
<cache>false</cache>
<broadcast_tcp>broadcast.chat.bilibili.com:4080</broadcast_tcp>
<broadcast_ws>broadcast.chat.bilibili.com:4090</broadcast_ws>
<broadcast_wss>broadcast.chat.bilibili.com:4095</broadcast_wss>
<default_dm>0</default_dm>
<dm_host>0://comment.bilibili.com,1://comment.bilibili.com/rc</dm_host>
<role>0</role>
<has_next>1</has_next>
<online_count>6</online_count>
<dm_mask></dm_mask>
<mask_new></mask_new>
<subtitle>{"allow_submit":false,"lan":"","lan_doc":"","subtitles":[{"id":23916631605379079,"lan":"zh-CN","lan_doc":"中文(中国)","is_lock":false,"subtitle_url":"//i0.hdslb.com/bfs/subtitle/dfb81041cf92b5c2ebce2540cd14c9e49674f460.json"}]}</subtitle>
<player_icon></player_icon>
<view_points></view_points>
<is_pay_preview>false</is_pay_preview>
<preview_toast>为创作付费,购买观看完整视频|购买观看</preview_toast>
<interaction></interaction>
<pugv_watch_status>0</pugv_watch_status>
<pugv_pay_status>0</pugv_pay_status>
<pugv_season_status>0</pugv_season_status>
<pcdn></pcdn>
<pcdn_loader>{"flv":{"vendor":"xl","script_url":"\/\/s1.hdslb.com\/bfs\/static\/pcdnjs\/pcdn-xlflv-20.07.20.min.js","group":"eg","labels":{"pcdn_video_type":"flv","pcdn_stage":"release","pcdn_group":"eg","pcdn_version":"20.07.20","pcdn_vendor":"xl"}},"dash":{"vendor":"yf","script_url":"\/\/s1.hdslb.com\/bfs\/static\/pcdnjs\/pcdn-yfdash-20.07.03.min.js","group":"eg","labels":{"pcdn_video_type":"dash","pcdn_stage":"release","pcdn_group":"eg","pcdn_version":"20.07.03","pcdn_vendor":"yf"}}}</pcdn_loader>
<options>{"is_360":false}</options>
<guide_attention></guide_attention>
<new_broadcast>1</new_broadcast>
<realtime_dm>1</realtime_dm>
<enable_gray_dash_playback>500</enable_gray_dash_playback>

上面的输出有个很重要的url就是 subtitles中的json url:

{"allow_submit":false,"lan":"","lan_doc":"","subtitles":[{"id":23916631605379079,"lan":"zh-CN","lan_doc":"中文(中国)","is_lock":false,"subtitle_url":"//i0.hdslb.com/bfs/subtitle/dfb81041cf92b5c2ebce2540cd14c9e49674f460.json"}]}

下载这个json后 我找到了最终的中文字幕:
格式如下:
在这里插入图片描述
4. 将json转为srt字幕
参考的: 这里

  1. 修改you-get 代码
    修改之后的commit在这里 可以直接用我的repo. (可能需要修改下buvid的值 每个人应该不一样 或者用 you-get的cookie file 应该也可以 我没去测试).

结论

  1. 下载了所有课程并共享到百度云了 百度云 链接: https://pan.baidu.com/s/1S93he0skiUwAihUH309-2Q 密码: sn5a
  2. 以后有时间了把代码merge到 you-get 中 现在懒得提PR了.
  3. 有个小疑问, 有没有什么快速的办法在FIREFOX中搜索某个字符串是哪个url返回的呢? 可能前端的同学更加清楚
  4. 修改后的代码 在: https://github.com/gaoxingliang/you-get/

猜你喜欢

转载自blog.csdn.net/scugxl/article/details/107928965