学习tensorflow最好的方式还是阅读官方文档:https://www.tensorflow.org/versions/r0.12/tutorials/seq2seq/
一、tensorflow的RNN使用:
1.使用LSTM
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
probabilities = []
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities.append(tf.nn.softmax(logits))
loss += loss_function(probabilities, target_words)
2.增加反向传播的层数
# Placeholder for the inputs in a given iteration.num_steps可以看作是一句话的单词数量
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
3.输入数据
# 在正式调用LSTM之前,需要进行word2vec来进行词编码,embedding_matrix is a tensor of shape [vocabulary_size embedding size]word_ids即单词的索引号
word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)
4.构建多层LSTM,number_of_layers即为构建的LSTM的深度
lstm = rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=False)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * number_of_layers,
state_is_tuple=False)
initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = stacked_lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
二、Seq2Seq Models:可以用于翻译,对话,语言生成等场景
1.涉及的文件:
seq2seq.py:构建seq2seq模型的一些库
seq2seq_model.py:seq2seq神经网络模型
data_utils.py:准备训练数据
translate.py:开始训练seq2seq模型
2.seq2seq模型的结构:
基本结构就是两部分,一部分是Encoder输入,另一部分是Decoder产生输出
3.seq2seq库的使用:
outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)
4.模型讲解:
为了高效解决句子长度不一致问题,将句子长度分为了几类
buckets = [(5, 10), (10, 15), (20, 25), (40, 50)]
5.训练模型步骤,以电影剧本对白数据为例:
(1)将数据进行格式化处理,然后进行训练集和测试集划分
(2)构建词典,然后将句子转成word ids
(3)定义超参数开始训练模型
(4)使用模型