"Garbage" how you say? B Analysis in Python station barrage

table of Contents

0 Preface

1 environment

Needs analysis

3 code implementation

4 Postscript

0 Preface

Wet paper towels and then also dry garbage? Melon skin is wet garbage do it again? ? Recently we are not garbage torture, silly you carry so clear? Since 2019.07.01, Shanghai has taken the lead in segregated regime, but also against the rules face fines.

In order to avoid huge losses, I decided to stop learning at skills b garbage classification. Why come b station, I heard that one of the most popular young contemporary way to learn but.

Open b station, searched under the garbage, it was scare up this title (suction) to (lead): the correct posture in Shanghai shameful.

"Garbage" how you say? B Analysis in Python station barrage

Of course, there's shame nor that shame, referring to taking out the trash lost.

Point to open to find the original was a comic dialogue ah, sister or two Meng (AI) of comic dialogue, and instantly to the interest, it explained about how the garbage classification.

After reading over and over again, just could not stop, it has opened a brainwashing mode, video is fun, after all, barrage video is fun!

Better Together alone, and better to use Python to save the barrage down, make a word cloud? So happily decided!

1 environment

Operating System: Windows

Python Version: 3.7.3

Needs analysis

We first need <F12> development and debugging tools, data query cid barrage of this video.

After get cid, and then fill in the following link.

http://comment.bilibili.com/{cid}.xml

After opening, you can see a list of the barrage video.

With the barrage of data, we need to first resolve well and saved locally to facilitate further processing, such as word cloud made on display.

3 code implementation

Here, we get requests a web page requests using the module; parse URL aid beautifulsoup4 module; saved as CSV data, here borrow pandas module. Because all third-party modules, such as the environment can not be installed using pip.


pip install requests
pip install beautifulsoup4
pip install lxml
pip install pandas

模块安装好之后，进行导入


import requests
from bs4 import BeautifulSoup
import pandas as pd

请求、解析、保存弹幕数据


# 请求弹幕数据
url = 'http://comment.bilibili.com/99768393.xml'
html = requests.get(url).content
# 解析弹幕数据
html_data = str(html, 'utf-8')
bs4 = BeautifulSoup(html_data, 'lxml')
results = bs4.find_all('d')
comments = [comment.text for comment in results]
comments_dict = {'comments': comments}
# 将弹幕数据保存在本地
br = pd.DataFrame(comments_dict)
br.to_csv('barrage.csv', encoding='utf-8')

接下来，我们就对保存好的弹幕数据进行深加工。

制作词云，我们需要用到 wordcloud 模块、matplotlib 模块、jieba 模块，同样都是第三方模块，直接用 pip 进行安装。


pip install wordcloud
pip install matplotlib
pip install jieba

模块安装好之后，进行导入，因为我们读取文件用到了 panda 模块，所以一并导入即可


from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt
import pandas as pd
import jieba

我们可以自行选择一张图片，并基于此图片来生成一张定制的词云图。我们可以自定义一些词云样式，代码如下：


# 解析背景图片
mask_img = plt.imread('Bulb.jpg')
'''设置词云样式'''
wc = WordCloud(
 # 设置字体 
 font_path='SIMYOU.TTF',
 # 允许最大词汇量
 max_words = 2000,
 # 设置最大号字体大小
 max_font_size = 80,
 # 设置使用的背景图片
 mask = mask_img,
 # 设置输出的图片背景色
 background_color=None, mode="RGBA",
 # 设置有多少种随机生成状态，即有多少种配色方案
 random_state=30)

接下来，我们要读取文本信息（弹幕数据），进行分词并连接起来：

Reading the contents of the file # 
br = pd.read_csv ( 'barrage.csv', header = None) 
# for word, and a space to link 
text = '' 
for Line in br [. 1]: 
text + = '' .join (jieba .cut (line, cut_all = False) )

Finally, take a look at our renderings

I feel there is no enthusiasm for the subject of waste separation, inexplicable sense of joy in my heart.

4 Postscript

Both AI sister said Meng comic is very good, do not know what to think Guo Degang to see this work. Back to the topic of garbage, the current "Shanghai Domestic Waste Management Regulations" has been officially implemented, not in Shanghai friends are not too happy, the Ministry of Housing said that 46 other key cities nationwide will soon experience ...... ha ha ha ha ha, interesting!

"Garbage" how you say? B Analysis in Python station barrage

Guess you like