Data Analysis Talent Competition - Visual Analysis of User Emotions

Table of contents

Preface

1. Introduction to the competition questions

2. Word cloud chart

1.Read data

2.Visibility

3. Correlation heat map

4. Visualization of different themes, different emotions, and different emotional words

Summarize


Preface

Data visualization mainly aims to convey and communicate information clearly and effectively with the help of graphical means, which can improve the efficiency of data analysis, better trace the causes from the results, and help operational decisions.


1. Introduction to the competition questions

The competition question is based on Internet public opinion analysis, requiring contestants to conduct data analysis and visualization of brand issues based on user comments. Use this competition question to guide commonly used data visualization charts and data analysis methods, and conduct exploratory data analysis on content of interest.

2. Word cloud chart

1.Read data

import re  # 正则表达式库                 
import collections  # 词频统计库
import numpy as np  
import jieba  # 结巴分词
import wordcloud  # 词云展示库
from PIL import Image  # 图像处理库
import matplotlib.pyplot as plt  
import pandas as pd
df=pd.read_csv('/earphone_sentiment.csv')
df.head()

 2.Visualization

# 提取相应类别的数据
posdata = df[df['sentiment_value']==1]
neudata = df[df['sentiment_value']==0]
negdata = df[df['sentiment_value']==-1]
# 去停用词、以posdata为例
words1=''.join(posdata['content'])
pattern = re.compile(u'\t|\n|\.|-|一|:|;|\)|\(|\?|"')  # 建立正则表达式匹配模式
string_data1 = re.sub(pattern, '', words1) # 将符合模式的字符串替换
seg_list_exact = jieba.cut(string_data1, cut_all=False)
stop_path =open('/stoplist.txt','r',encoding='utf-8') 
remove_words= stop_path.readlines()
remove_words=[x.replace('\n','')for x in remove_words ]
remove_words.append(' ')
object_list = [i for i in seg_list_exact if i not in remove_words] # 将不在去除词列表中的词添加到列表中
word_counts = collections.Counter(object_list)  # 对分词做词频统计
#词云图
plt.rcParams['font.sans-serif'] = ['SimSun'] # 修改字体为宋体
mask = np.array(Image.open('/39.jpeg'))  # 定义词频背景
wc = wordcloud.WordCloud(
    font_path='C:/Windows/Fonts/simhei.ttf',  # 设置字体格式
    mask=mask,  # 设置背景图
    max_words=100,  # 设置最大显示的词数
    max_font_size=100  # 设置字体最大值
)
wc.generate_from_frequencies(word_counts)  # 从字典生成词云
image_colors = wordcloud.ImageColorGenerator(mask)  # 从背景图建立颜色方案
wc.recolor(color_func=image_colors)  # 将词云颜色设置为背景图方案
plt.figure(figsize=(10,8))  #显示图片大小
plt.title('正向词云',fontstyle='oblique',fontsize='30',fontweight='heavy',alpha=0.8)
plt.imshow(wc)  # 显示词云
plt.axis('off')  # 关闭坐标轴

 

3. Correlation heat map

# 数据透视表
df_pivot_tabel = df.pivot_table(index='sentiment_value', columns='subject', values='sentiment_word',aggfunc=len)
df_pivot_tabel

import seaborn as sns
plt.figure(figsize=(12,10))
sns.heatmap(df_pivot_tabel.corr(),vmax=0.9,linewidths=0.05,cmap="YlGnBu_r",annot=True)

4. Visualization of different themes, different emotions, and different emotional words

For data that requires a large amount of visualization, I still prefer to use third-party tools to achieve it, which is more convenient and efficient.


Summarize

1. This competition is a good opportunity to learn how to process text. Text is also a kind of data structure. Sometimes data processing is not just structured data. It is also possible to face unstructured data, and learning more processing methods will only bring benefits.

2. Python can well support big data analysis, mining, machine learning, etc. If you need some visualization to help judge when processing data , using Python for visualization is a relatively good choice. After all, Python also has many powerful third-party libraries, such as matplotlib, seaborn, plotly, pyechart, etc. However, if you need to batch visualize and display the processed data, it is still recommended to use Third-party tools such as power bi and tableau are more convenient.

Guess you like

Origin blog.csdn.net/weixin_46685991/article/details/126165457