爬取tx视频弹幕(vip非vip都可使用)

免责声明:本文章涉及到的应用仅供学习交流使用,不得用于任何商业用途,数据来源于互联网公开内容,没有获取任何私有和有权限的信息(个人信息等)。由此引发的任何法律纠纷与本人无关!禁止将本文技术或者本文所关联的Github项目源码用于任何目的。

这只是一个学习的log 仅供参考

爬取视频用到的库
import json
import time
import requests

一. 我们先来看网页url的获取

因为弹幕是实时加载的所以他是异步的通过关键词搜索我们可以发现,携带信息的url,通过比对他们url的差别,最后面的两个数就是弹幕的加载请求变化。

 

 

 

 

 拿到这个链接:(一个综艺)

https://dm.video.qq.com/barrage/segment/p00308yww2y/t/v1/30000/60000

可以用一个循环来请求这个弹幕

扫描二维码关注公众号,回复: 15440075 查看本文章
sum = 0
start = 90000
for start in range(0, 5220001, 30000):
    end = start + 30000
    url = "https://dm.video.qq.com/barrage/segment/p00308yww2y/t/v1/{}/{}".format(start, end)
    print(start, end)

    # 获取浏览器响应信息

    resp = requests.get(url, headers = headers)

拿到一个json,然后我们一点一点找到弹幕

    # 转为json对象
    json_datas = json.loads(resp.text)["barrage_list"]
    # print(len(json_datas))
    sum += len(json_datas)
    for i in range(0, len(json_datas)):
        json_data = json_datas[i]["content"]
        json_data = json_data.strip()
        print(json_data)
        file.writelines(json_data + "\n")

拿到 content弹幕位置  将他写入文件

最后记得关闭


    resp.close()
    time.sleep(1)

print(sum)
file.close()

完整代码 (start和间隔有时会不同,需要自己调整)

# -*- coding: utf-8 -*-
import json
import time
import requests
# https://dm.video.qq.com/barrage/segment/o0030rmqnzx/t/v1/4530000/4560000 6
# https://dm.video.qq.com/barrage/segment/p0030payq9c/t/v1/5070000/5100000  7
# https://dm.video.qq.com/barrage/segment/v0030fjquzh/t/v1/5370000/5400000   8
# https://dm.video.qq.com/barrage/segment/g00307cv90x/t/v1/5520000/5550000  9
# https://dm.video.qq.com/barrage/segment/p00308yww2y/t/v1/5220000/5250000 10
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.39"
}

file = open("txt/第10集.txt", "w", encoding='utf-8')
sum = 0
start = 90000
for start in range(0, 5220001, 30000):
    end = start + 30000
    url = "https://dm.video.qq.com/barrage/segment/p00308yww2y/t/v1/{}/{}".format(start, end)
    print(start, end)

    # 获取浏览器响应信息

    resp = requests.get(url, headers = headers)

    # 转为json对象
    json_datas = json.loads(resp.text)["barrage_list"]
    # print(len(json_datas))
    sum += len(json_datas)
    for i in range(0, len(json_datas)):
        json_data = json_datas[i]["content"]
        json_data = json_data.strip()
        print(json_data)
        file.writelines(json_data + "\n")

    resp.close()
    time.sleep(1)

print(sum)
file.close()

猜你喜欢

转载自blog.csdn.net/qq_25976859/article/details/130366864