Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

Preface

What does it mean to feed a rat? Many people may not know that this stalk is from Ma Baoguo. People who often use the Internet may have heard of this name, and some of his words have also been regarded as stalks, such as young people’s lack of morality, lightning five strokes, etc.

In a video of Ma Baoguo at station B, his right eye was frequently red and golden. Some classic sentence patterns were even summarized by netizens as "Guo Guoti", which became a catch phrase, such as "Young people do not speak martial arts, bully me. The 19-year-old comrade" "Mouse Tail Juice" (good for himself) etc.

So let's take a look at what barrage most netizens send

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

Project Objectives

Crawl the barrage of station B and display the word cloud diagram

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

The playback volume of the first video is almost 2000W, and the barrage is also 4.8W

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

Once a day, the source of happiness, hehehe

surroundings

Python3.6

pycharm

Crawler code

Import tool

import requests
import parsel
import csv
import time

Press F12 first to find out where the data of the barrage is

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

Get the URL address and other data of the detail page from the list page

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

for page in range(20,32):
    time.sleep(1)
    print('=================正在下载11月{}日弹幕===================================='.format(page))
    url = 'https://api.bilibili.com/x/v2/dm/history?type=1&oid=140610898&date=2020-11-{}'.format(page)
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
        "cookie": "__uuid=1896D3F7-4A98-54EB-F7FA-3301CE9EF5F307776infoc; buvid3=B68B2187-4C3E-4466-A896-FBF9B292099B190963infoc; LIVE_BUVID=AUTO4115757254257055; stardustvideo=1; rpdid=|(umu|ulY)JJ0J'ul~l~klRJ); sid=8cq4r229; im_notify_type_65901796=0; laboratory=1-1; DedeUserID=523606542; DedeUserID__ckMd5=909861ec223d26d8; blackside_state=1; CURRENT_FNVAL=80; SESSDATA=a976c0b4%2C1618637313%2C4d792*a1; bili_jct=7f54729ec20660f750661122b80746d2; PVID=1; bp_video_offset_523606542=458111639975213216; CURRENT_QUALITY=16; bfe_id=1e33d9ad1cb29251013800c68af42315"
    }
    response = requests.get(url=url, headers=headers)
    response.encoding = response.apparent_encoding
    selector = parsel.Selector(response.text)
    data = selector.css('d::text').getall()
    for i in data:
        print(i)
        with open('B站弹幕.csv', mode='a', newline='', encoding='utf-8-sig') as f:
            writer = csv.writer(f)
            links = []
            links.append(i)
            writer.writerow(links)

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

 

Word cloud code

import jieba
import wordcloud
import imageio

# 读取文件内容
f = open('csv文件地址', encoding='utf-8')
txt = f.read()
# print(txt)
# jiabe 分词 分割词汇
txt_list = jieba.lcut(txt)
string = ' '.join(txt_list)
# 词云图设置
wc = wordcloud.WordCloud(
        width=800,         # 图片的宽
        height=500,         # 图片的高
        background_color='white',   # 图片背景颜色
        font_path='msyh.ttc',    # 词云字体
        # mask=py,     # 所使用的词云图片
        scale=15,
)
# 给词云输入文字
wc.generate(string)
# 词云图保存图片地址
wc.to_file('\\1.png')

Python crawls the tail juice of the mouse at station B, and does not tell the video barrage of the source of martial arts

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/109755030