抓取b站今日热门

一、抓取标题和视频地址并下载

二、思路

  1.打开目标地址:http://vc.bilibili.com/p/eden/rank#/?tab=%E5%85%A8%E9%83%A8

  2.按f12

  3.点network查找相应url如图

   4.开始写代码:

import requests
import json,time

headers = {
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
    }
def get_json(url,i):

    data = {
        'page_size': '10',
        'next_offset':str(i) ,
        'tag': '今日热门',
        'platform': 'pc',
    }
    html = requests.get(url,params=data,headers=headers).text
    return html

def dowm_mv(url,title):
    start = time.time()  # 开始时间
    size = 0
    response = requests.get(url, headers=headers, stream=True)  # stream属性必须带上
    chunk_size = 1024  # 每次下载的数据大小
    content_size = int(response.headers['content-length'])  # 总大小
    if response.status_code == 200:
        print('[文件大小]:%0.2f MB' % (content_size / chunk_size / 1024))  # 换算单位
        with open(title, 'wb') as file:
            for data in response.iter_content(chunk_size=chunk_size):
                file.write(data)
                size += len(data)  # 已下载的文件大小

if __name__ == '__main__':
    for i in range(0,9):
        num = i * 10 + 1
        url = 'http://api.vc.bilibili.com/board/v1/ranking/top?'
        html_json = get_json(url,num)
        html_json = json.loads(html_json)
        print(html_json)
        infos = html_json['data']['items']
        for info in infos:
            title = info['item']['description']
            mv = info['item']['video_playurl']
            print(title,mv)
            try:
                dowm_mv(mv,title = '%s.mp4'%title)
            except Exception as e:
                print('下载失败')

  5.抓取结果如下

猜你喜欢

转载自www.cnblogs.com/a595452248/p/11523205.html
今日推荐