Natural Language Processing Practical Project 12-Practice of Sentiment Analysis Task Based on CNN-BiGRU Model of Attention Mechanism

Hello everyone, I am Wei Xue AI. Today I will introduce you to the practice of Natural Language Processing Practical Project 12-CNN-BiGRU model based on attention mechanism. This article will introduce a CNN-BiGRU model based on attention mechanism. model and apply it to real projects. We will use several CSV data samples and show how to load the data, train the model, and output the accuracy and loss values. The article will provide a complete runnable code, as well as a detailed directory structure, so that readers can understand and implement it.

Article directory structure:

  1. Project Background and Requirements
  2. Dataset Introduction and Data Processing
  3. Introduction to CNN-BiGRU model
  4. attention mechanism
  5. Code
  6. results and analysis
  7. Summarize

1. Project Background and Requirements

Text classification is a common task in many natural language processing (NLP) tasks. For example, for sentiment analysis, spam detection, or topic classification. To achieve efficient text classification, we will use a hybrid model of attention-based convolutional neural network (CNN) and bidirectionally gated recurrent unit (BiGRU). The model will make full use of the attention mechanism to effectively capture text features and improve the performance of the model.

2. Dataset introduction and data processing

The data set used in this project is an online comment data, including text comments and corresponding sentiment labels. The dataset is formatted as CSV and contains two columns, "Comments" and "Sentiment Labels".

First, let's load the data and preprocess it. This includes removing stop words, punctuation, etc., and converting the text to a sequence of integers.

Data sample display, data.csv

评论,情感标签
这个产品值得购买,很实用,1
产品不好用,不太好,0
这个产品款式很漂亮,质量也很好,1
使用了这个产品后觉得很满意,性价比很高,1
这个产品的功能非常实用,使用起来很方便,1
产品的品质不错,很耐用,推荐购买,1
希望产品能够有更多的颜色选择,提供更多的选择空间,0
产品的包装不够用心,有些瑕疵,0
对这个产品的质量表示怀疑,使用不了多久就出现问题了,0
产品的使用说明书不清晰,让人很困扰,0
这个产品的价格有点高,性价比不够高,0
使用了这个产品后觉得很失望,没有达到我的预期,0
这个产品的材质不够好,感觉很廉价,0
产品的外观设计不太满意,有点土气,0
这个产品使用起来很麻烦,不太方便,0
很后悔购买了这个产品,完全没有用处,0
这个产品真的很好,超出了我的期望,1
使用了这个产品后觉得非常方便,非常实用,1
买了这个产品,完全不好用,0
这个产品真的很好,真的值得买,1
这个产品,根本不能用,后悔,0
其实这个产品挺好的,下次再来,1
这个产品值得购买,很实用,1
产品不好用,不太好,0
这个产品款式很漂亮,质量也很好,1
使用了这个产品后觉得很满意,性价比很高,1
这个产品的功能非常实用,使用起来很方便,1
产品的品质不错,很耐用,推荐购买,1
希望产品能够有更多的颜色选择,提供更多的选择空间,0
产品的包装不够用心,有些瑕疵,0
对这个产品的质量表示怀疑,使用的时候出现问题了,0
产品的使用说明书不清晰,真不好,0
产品的价格有点高,性价比不高,0
使用了这个产品后觉得很失望,没有达到我想要的,0
产品的质量不错,物有所值,很满意,1
非常喜欢这个产品的外观,简洁大方,1
这个产品使用起来很顺手,非常好用,1
真的买了这个产品,完全不好用,0
这个产品真的很好,真的值得买,1
这个产品,根本不能用,后悔,0
其实这个产品挺好的,下次再来,1
这个产品的设计很时尚,非常符合我的口味,1
这个产品的材质不够好,感觉很划算,0
产品的外观设计其实不太满意,有点不好,0
产品的质量不错,物有所值,很满意,1
非常喜欢这个产品的外观,简洁大方,1
这个产品使用起来很顺手,非常好用,1
很喜欢这个产品,使用起来非常方便,1
不愧是知名品牌的产品,质量可靠,1
不仅外观漂亮,而且性能出色,非常满意,1
刚收到产品就迫不及待地使用了,效果非常好,1
很轻便的产品,携带方便,非常适合旅行使用,1
产品有点小问题,但客服很快帮我解决了,还算不错,1
使用起来稍微有些复杂,但一旦熟悉了就非常好用,1
我不喜欢这个产品的设计,外观也不太吸引人,0
使用这个产品时遇到了一些困难,不太好上手,0
产品的质量有些差,不够耐用,0
这个产品不够实用,性价比不高,0
使用起来感觉产品功能有所缺失,不全面,0
产品的性能一般般,没有什么特别出众之处,0
这个产品使用起来很麻烦,操作不够方便,0
不太满意这个购买,对产品的期待落空了,0
这个产品的质量不够稳定,有些时候会出问题,0
对这个产品的使用体验感到失望,效果不如预期,0
产品的外观看起来比较廉价,不够高档,0
这个产品不太耐用,容易出现一些小故障,0
产品的功能设计不合理,使用起来不够顺心,0
我对这个产品的质量表示怀疑,性能不够稳定,0
感觉这个产品略微有些贵,但质量还算可以,1
刚开始使用时有点小问题,但客服很快响应解决了,还不错,1
产品的外观设计很简约大气,我很喜欢,1
这个产品的功能很强大,用起来非常顺手,1
产品性能不太稳定,有时候会出现一些小问题,0
这个产品售后服务需要改进,0
产品质量不太理想,使用起来有些困扰,0
外观一般,功能还可以,总体来说还不错,1
好评如潮的产品果然不错,完全符合我的期待,1
产品的材质感很好,手感也很舒适,非常满意,1
我对这个购买有所失望,性能没有预期好,0
这个产品并不如我所期待的那样好,有一些问题,0
很喜欢这个产品,使用起来非常方便,1
不愧是知名品牌的产品,质量可靠,1
不仅外观漂亮,而且性能出色,非常满意,1
刚收到产品就迫不及待地使用了,效果非常好,1
很轻便的产品,携带方便,非常适合旅行使用,1
产品有点小问题,但客服很快帮我解决了,还算不错,1
使用起来稍微有些复杂,但一旦熟悉了就非常好用,1
我不喜欢这个产品的设计,外观也不太吸引人,0
使用这个产品时遇到了一些困难,不太好上手,0
产品的质量有些差,不够耐用,0
这个产品不够实用,性价比不高,0
使用起来感觉产品功能有所缺失,不全面,0

2.1 Load data

import pandas as pd

data = pd.read_csv('data.csv')
texts = data['评论'].tolist()
labels = data['情感标签'].tolist()

2.2 Data preprocessing

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# 参数设置
maxlen = 100
vocab_size = 10000

# 文本预处理
tokenizer = Tokenizer(num_words=vocab_size)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index

# 序列填充
data = pad_sequences(sequences, maxlen=maxlen)

3. Introduction to CNN-BiGRU model

The model used in this project combines a Convolutional Neural Network (CNN) with a Bidirectional Gated Recurrent Unit (BiGRU). CNN is responsible for extracting local features, while BiGRU is responsible for capturing long-range dependencies. This structure enables the model to better understand the information in the text.

The model structure is as follows:

  1. Embedding layer (Embedding)
  2. Convolutional layer (Conv1D)
  3. Bidirectional GRU layer (Bidirectional GRU)
  4. attention layer
  5. Fully connected layer (Dense)

4. Attention mechanism

In this project, we will use an attention mechanism to help the model understand text better. The attention mechanism can assign a weight to each word, indicating its importance in the text. This allows the model to better focus on important words, which improves performance.

We will use the Bahdanau attention mechanism, which is calculated as follows:

α t = exp ⁡ ( e t ) ∑ j = 1 T exp ⁡ ( e j ) \alpha_{t}=\frac{\exp(e_{t})}{\sum_{j=1}^{T}\exp(e_{j})} at=j=1Texp(ej)exp(et)

e t = a ( s t − 1 , h t ) e_{t}=a(\boldsymbol{s}_{t-1},\boldsymbol{h}_{t}) et=a(st1,ht)

Here, α t \alpha_{t}atIndicates the attention weight, et e_{t}etRepresents the energy value, st − 1 \boldsymbol{s}_{t-1}st1Represents the hidden state of the decoder at the previous time step, ht \boldsymbol{h}_{t}htRepresents the hidden state of the encoder.

5. Code implementation

Next, we will implement the attention-based CNN-BiGRU model and use the previously loaded data for training.

5.1 Define the attention layer

import tensorflow as tf
from tensorflow.keras.layers import Layer

class Attention(Layer):
    def __init__(self, **kwargs):
        super(Attention, self).__init__(**kwargs)

    def build(self, input_shape):
        input_shape = tuple(dim.value for dim in input_shape)
        self.W = self.add_weight(shape=(input_shape[-1], input_shape[-1]), initializer='glorot_uniform', trainable=True, name='attention_W')
        self.b = self.add_weight(shape=(input_shape[-1],), initializer='zeros', trainable=True, name='attention_b')
        self.u = self.add_weight(shape=(input_shape[-1], 1), initializer='glorot_uniform', trainable=True, name='attention_u')
        super(Attention, self).build(input_shape)

    def call(self, x):
        e = tf.nn.tanh(tf.tensordot(x, self.W, axes=1) + self.b)
        a = tf.nn.softmax(tf.tensordot(e, self.u, axes=1), axis=1)
        return tf.reduce_sum(x * a, axis=1)

5.2 Build the model

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Conv1D, Bidirectional, GRU, Dense

# 参数设置
embedding_dim = 100
filters = 64
kernel_size = 3
gru_units = 64

# 构建模型
inputs = Input(shape=(maxlen,))
x = Embedding(vocab_size, embedding_dim)(inputs)
x = Conv1D(filters, kernel_size, activation='relu')(x)
x = Bidirectional(GRU(gru_units, return_sequences=True))(x)
x = Attention()(x)
x = Dense(32, activation='relu')(x)
outputs = Dense(1, activation='sigmoid')(x)

model = Model(inputs=inputs, outputs=outputs)
model.summary()

5.3 Training model

from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

# 模型编译
model.compile(optimizer=Adam(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])

# 模型训练
history = model.fit(X_train, y_train, batch_size=64, epochs=10, validation_data=(X_test, y_test))

5.4 Output accuracy and loss value

import matplotlib.pyplot as plt

# 绘制准确率曲线
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# 绘制损失值曲线
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

operation result:

Epoch 30/30

  8/131 [>.............................] - ETA: 0s - loss: 1.7702e-05 - acc: 1.0000
 16/131 [==>...........................] - ETA: 0s - loss: 1.5579e-05 - acc: 1.0000
 24/131 [====>.........................] - ETA: 0s - loss: 1.7362e-05 - acc: 1.0000
 32/131 [======>.......................] - ETA: 0s - loss: 1.9606e-05 - acc: 1.0000
 48/131 [=========>....................] - ETA: 0s - loss: 2.0194e-05 - acc: 1.0000
 56/131 [===========>..................] - ETA: 0s - loss: 2.0542e-05 - acc: 1.0000
 72/131 [===============>..............] - ETA: 0s - loss: 2.0055e-05 - acc: 1.0000
 88/131 [===================>..........] - ETA: 0s - loss: 1.9102e-05 - acc: 1.0000
104/131 [======================>.......] - ETA: 0s - loss: 1.8955e-05 - acc: 1.0000
112/131 [========================>.....] - ETA: 0s - loss: 1.9084e-05 - acc: 1.0000
128/131 [============================>.] - ETA: 0s - loss: 1.8875e-05 - acc: 1.0000
131/131 [==============================] - 1s 7ms/sample - loss: 1.8943e-05 - acc: 1.0000 - val_loss: 2.7403e-04 - val_acc: 1.0000

insert image description here
insert image description here

6. Results and Analysis

By observing the accuracy rate and loss value curves during the training process, we can find that the model performs very well on both the training set and the test set. This shows that the attention-based CNN-BiGRU model can be effectively applied to text classification tasks.

7. Summary

This article introduces in detail the application of the CNN-BiGRU model based on the attention mechanism in the project, including data processing, model construction, training and result analysis. Through the application of real projects, we confirmed the effectiveness of this model on text classification tasks. The introduction of the attention mechanism enables the model to better focus on important words, thereby improving performance. I hope this article can provide readers with some help in order to use this model in actual projects.

Guess you like

Origin blog.csdn.net/weixin_42878111/article/details/131616833