The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.
Preface
What does it mean to feed a rat? Many people may not know that this stalk is from Ma Baoguo. People who often use the Internet may have heard of this name, and some of his words have also been regarded as stalks, such as young people’s lack of morality, lightning five strokes, etc.
In a video of Ma Baoguo at station B, his right eye was frequently red and golden. Some classic sentence patterns were even summarized by netizens as "Guo Guoti", which became a catch phrase, such as "Young people do not speak martial arts, bully me. The 19-year-old comrade" "Mouse Tail Juice" (good for himself) etc.
So let's take a look at what barrage most netizens send
Project Objectives
Crawl the barrage of station B and display the word cloud diagram
The playback volume of the first video is almost 2000W, and the barrage is also 4.8W
Once a day, the source of happiness, hehehe
surroundings
Python3.6
pycharm
Crawler code
Import tool
import requests
import parsel
import csv
import time
Press F12 first to find out where the data of the barrage is
Get the URL address and other data of the detail page from the list page
for page in range(20,32):
time.sleep(1)
print('=================正在下载11月{}日弹幕===================================='.format(page))
url = 'https://api.bilibili.com/x/v2/dm/history?type=1&oid=140610898&date=2020-11-{}'.format(page)
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
"cookie": "__uuid=1896D3F7-4A98-54EB-F7FA-3301CE9EF5F307776infoc; buvid3=B68B2187-4C3E-4466-A896-FBF9B292099B190963infoc; LIVE_BUVID=AUTO4115757254257055; stardustvideo=1; rpdid=|(umu|ulY)JJ0J'ul~l~klRJ); sid=8cq4r229; im_notify_type_65901796=0; laboratory=1-1; DedeUserID=523606542; DedeUserID__ckMd5=909861ec223d26d8; blackside_state=1; CURRENT_FNVAL=80; SESSDATA=a976c0b4%2C1618637313%2C4d792*a1; bili_jct=7f54729ec20660f750661122b80746d2; PVID=1; bp_video_offset_523606542=458111639975213216; CURRENT_QUALITY=16; bfe_id=1e33d9ad1cb29251013800c68af42315"
}
response = requests.get(url=url, headers=headers)
response.encoding = response.apparent_encoding
selector = parsel.Selector(response.text)
data = selector.css('d::text').getall()
for i in data:
print(i)
with open('B站弹幕.csv', mode='a', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
links = []
links.append(i)
writer.writerow(links)
Word cloud code
import jieba
import wordcloud
import imageio
# 读取文件内容
f = open('csv文件地址', encoding='utf-8')
txt = f.read()
# print(txt)
# jiabe 分词 分割词汇
txt_list = jieba.lcut(txt)
string = ' '.join(txt_list)
# 词云图设置
wc = wordcloud.WordCloud(
width=800, # 图片的宽
height=500, # 图片的高
background_color='white', # 图片背景颜色
font_path='msyh.ttc', # 词云字体
# mask=py, # 所使用的词云图片
scale=15,
)
# 给词云输入文字
wc.generate(string)
# 词云图保存图片地址
wc.to_file('\\1.png')