Text Sentiment Analysis - Machine Learning Experiment 3

Sentiment Analysis - Machine Learning Experiment 3

Experiment purpose:
Through experiments, master the overall process of text analysis, understand text classification, sentiment analysis, automatic summarization, etc.
Through the given text content, complete word segmentation, text vectorization, text classification, sentiment analysis and other related experiments Experiments
can be obtained from the text Choose one of classification, sentiment analysis, and automatic summarization to conduct a complete experiment

1. Import the pandas library

import pandas as pd

insert image description here
2. Read in the dataset of "shopping reviews" and print out the first 10 rows.

# 读入原始数据集
dfpos = pd.read_excel("购物评论.xlsx", sheet_name = "正向", header=None)
dfpos['y'] = 1
dfneg = pd.read_excel("购物评论.xlsx", sheet_name = "负向", header=None)
dfneg['y'] = 0
df0 = dfpos.append(dfneg, ignore_index = True)
df0.head(10)

insert image description here
3. Import
the word segmentation principle of the "jieba" library Jieba library: use a Chinese thesaurus to determine the probability of association between Chinese characters, and form phrases with high probability between Chinese characters to form word segmentation results.

import jieba

Install "jieba" tutorial: anaconda install jieba
4, perform word segmentation and preprocessing.
View word segmentation results:

# 分词和预处理
cuttxt = lambda x: " ".join(jieba.lcut(x)) # 这里不做任何清理工作,以保留情感词
df0["cleantxt"] = df0[0].apply(cuttxt) 
df0.head()

insert image description here
5. Ignore terms that appear in less than 5 documents

from sklearn.feature_extraction.text import CountVectorizer
countvec = CountVectorizer(min_df = 5) # 出现5次以上的才纳入

wordmtx = countvec.fit_transform(df0.cleantxt)
wordmtx

insert image description here
6 Generate training set and test set according to the ratio of 7:3

# 按照7:3的比例生成训练集和测试集
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    wordmtx, df0.y, test_size=0.3) # 这里可以直接使用稀疏矩阵格式
x_train[0]

insert image description here
7 Modeling with SVMs

# 使用SVM进行建模
from sklearn.svm import SVC

clf=SVC(kernel = 'rbf', verbose = True)
clf.fit(x_train, y_train) # 内存占用可能较高
clf.score(x_train, y_train)

insert image description here
8 Evaluate the effect of the model

# 对模型效果进行评估
from sklearn.metrics import classification_report

print(classification_report(y_test, clf.predict(x_test)))

insert image description here

clf.predict(countvec.transform([df0.cleantxt[0]]))[0]

10 Use the model to process the actual text and check the prediction effect

# 模型预测
import jieba

def m_pred(string, countvec, model) : 
    words = " ".join(jieba.lcut(string))
    words_vecs = countvec.transform([words]) # 数据需要转换为可迭代格式
     
    result = model.predict(words_vecs)
    
    if int(result[0]) == 1:
        print(string, ":正向")
    else:
        print(string, ":负向")
        

comment = "外观美观,速度也不错。上面一排触摸键挺实用。应该对得起这个价格。当然再降点大家肯定也不反对。风扇噪音也不大。"
m_pred(comment, countvec, clf)

insert image description here

comment = "作为女儿6.1的礼物。虽然晚到了几天。等拿到的时候,女儿爱不释手,上洗手间也看,告知不好。竟以学习毛主席来反驳我。我反对了几句,还说我对主席不敬。晕。上周末,告诉我她把火鞋和风鞋拿到学校,好多同学羡慕她。呵呵,我也看了其中的人鸦,只可惜没有看完就在老公的催促下睡了。说了这么多,归纳为一句:这套书买的值。"
m_pred(comment, countvec, clf)  

insert image description here
Summary: In this experiment, I basically mastered the overall process of text analysis, understood the content of sentiment analysis, and conducted sentiment analysis of shopping reviews.
Combining textbooks and experiments, I learned that sentiment analysis is the process of mining and analyzing the subjective emotional color expressed in the text content. This experiment also used the Jieba library. The word segmentation principle of "jieba" is to use a Chinese lexicon to determine the correlation probability between Chinese characters, and form phrases with high probability between Chinese characters to form word segmentation results.

Pay attention to the official account: Time Wood
Reply: Text sentiment analysis
can get relevant codes, data, documents.


More university course experiment training can follow the official account to reply to related keywords

Guess you like

Origin blog.csdn.net/qq_43374681/article/details/118443131