官网实例详解4.17（imdb_lstm.py）-keras学习笔记四

IMDB数据集情感分类任务训练LSTM（长短时记忆网络）模型

代码注释

'''Trains an LSTM model on the IMDB sentiment classification task.
IMDB数据集情感分类任务训练LSTM（长短时记忆网络）模型
The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.
数据集实际上太小，与简单、更快的方法（如TF IDF+LoReGG）相比，LSTM不具有任何优势，

TF-IDF（term frequency–inverse document frequency）是一种用于信息检索与数据挖掘的常用加权技术。
TF意思是词频(Term Frequency)，IDF意思是逆文本频率指数(Inverse Document Frequency)。

# Notes
注意
- RNNs are tricky. Choice of batch size is important,
RNNS很附件，批次大小的选择很重要
choice of loss and optimizer is critical, etc.
损失（函数）和优化器的选择至关重要，等等。
Some configurations won't converge.
有些配置不会收敛。

- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
LSTM在训练过程中的损失减少模式和CNNs/MLPs/etc完全不同
'''
from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
# 剪切超出maxlen = 80 长度文本（在Max最常用的单词中）
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen) # 每个句子maxlen = 80长度
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)


print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128)) #  max_features = 20000 词汇量（20000个不同单词）， 每个词用128向量表示，句子长度80个单词（pad_sequences已处理）
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2)) 
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
# 尝试不同的优化器和优化器配置
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=15,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

代码执行

Keras详细介绍

英文：https://keras.io/

中文：http://keras-cn.readthedocs.io/en/latest/

实例下载

https://github.com/keras-team/keras

https://github.com/keras-team/keras/tree/master/examples

完整项目下载

方便没积分童鞋，请加企鹅452205574，共享文件夹。

包括：代码、数据集合（图片）、已生成model、安装库文件等。

官网实例详解4.17（imdb_lstm.py）-keras学习笔记四

猜你喜欢