Chinese emotion recognition 3
Sequence Category: IMDB critic Category
Sequence is classified by space or time series input, predicted sequence of task categories. Category in the sequence, the most
big problem is the length of the sequence may be varied, and the input symbol by a very large vocabulary composition, and may need to learn to model a dependency between the input sequence context or symbols. This chapter explains how to resolve sequence classification LSTM
Problem Description
Using IMDB data set to analyze the sequence classification to analyze critics in the evaluation of the film by LSTM.
Simple LSTM
Ci + LSTM + buried layer output layer
The key issue
- Appreciated that embedded layer is not in place
considered reproducibility: http://frankchen.xyz/2017/12/18/How-to-Use-Word-Embedding-Layers-for-Deep-Learning-with-Keras/
Code
'''
序列分类:IMDB 影评分类 LSTM
'''
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from keras.datasets import imdb
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import LSTM
from keras.layers import Dense
seed = 7
top_words = 5000
max_words = 500
out_dimension = 32
batch_size = 128
epochs = 2
def build_model():
model = Sequential()
model.add(Embedding(top_words, out_dimension, input_length=max_words))
model.add(LSTM(units=100))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
# 输出模型的概要信息
model.summary()
return model
np.random.seed(seed=seed)
# 导入数据
(x_train, y_train), (x_validation, y_validation) = imdb.load_data(num_words=top_words)
x_train = sequence.pad_sequences(x_train, maxlen=max_words)
x_train = sequence.pad_sequences(x_train, maxlen=max_words)
x_validation = sequence.pad_sequences(x_validation, maxlen=max_words)
# 生成模型并训练模型
model = build_model()
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)
scores = model.evaluate(x_validation, y_validation, verbose=2)
print('Accuracy: %.2f%%' % (scores[1] * 100))
result
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, 500, 32) 160000
_________________________________________________________________
lstm_6 (LSTM) (None, 100) 53200
_________________________________________________________________
dense_3 (Dense) (None, 1) 101
=================================================================
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
M:\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/2
- 468s - loss: 0.5198 - accuracy: 0.7326
Epoch 2/2
- 412s - loss: 0.2807 - accuracy: 0.8871
Accuracy: 85.58%