tensorflow之seq2seq阅读笔记

学习tensorflow最好的方式还是阅读官方文档:https://www.tensorflow.org/versions/r0.12/tutorials/seq2seq/

一、tensorflow的RNN使用:

1.使用LSTM

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state
= tf.zeros([batch_size, lstm.state_size])
probabilities
= []
loss
= 0.0
for current_batch_of_words in words_in_dataset:
   
# The value of state is updated after processing each batch of words.
    output
, state = lstm(current_batch_of_words, state)

   
# The LSTM output can be used to make next word predictions
    logits
= tf.matmul(output, softmax_w) + softmax_b
    probabilities
.append(tf.nn.softmax(logits))
    loss
+= loss_function(probabilities, target_words)
2.增加反向传播的层数

# Placeholder for the inputs in a given iteration.num_steps可以看作是一句话的单词数量
words
= tf.placeholder(tf.int32, [batch_size, num_steps])

lstm
= rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state
= state = tf.zeros([batch_size, lstm.state_size])

for i in range(num_steps):
   
# The value of state is updated after processing each batch of words.
    output
, state = lstm(words[:, i], state)

   
# The rest of the code.
   
# ...

final_state
= state
3.输入数据

# 在正式调用LSTM之前,需要进行word2vec来进行词编码,embedding_matrix is a tensor of shape [vocabulary_size embedding size]word_ids即单词的索引号
word_embeddings
= tf.nn.embedding_lookup(embedding_matrix, word_ids)
4.构建多层LSTM,number_of_layers即为构建的LSTM的深度

lstm = rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=False)
stacked_lstm
= rnn_cell.MultiRNNCell([lstm] * number_of_layers,
    state_is_tuple
=False)

initial_state
= state = stacked_lstm.zero_state(batch_size, tf.float32)
for i in range(num_steps):
   
# The value of state is updated after processing each batch of words.
    output
, state = stacked_lstm(words[:, i], state)

   
# The rest of the code.
   
# ...

final_state
= state

二、Seq2Seq Models:可以用于翻译,对话,语言生成等场景

1.涉及的文件:


seq2seq.py:构建seq2seq模型的一些库

seq2seq_model.py:seq2seq神经网络模型

data_utils.py:准备训练数据

translate.py:开始训练seq2seq模型


2.seq2seq模型的结构:

基本结构就是两部分,一部分是Encoder输入,另一部分是Decoder产生输出


3.seq2seq库的使用:

outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)
4.模型讲解:

(1)bucketing(分桶)和padding(补齐)

为了高效解决句子长度不一致问题,将句子长度分为了几类

buckets = [(5, 10), (10, 15), (20, 25), (40, 50)]

5.训练模型步骤,以电影剧本对白数据为例:

(1)将数据进行格式化处理,然后进行训练集和测试集划分

(2)构建词典,然后将句子转成word ids

(3)定义超参数开始训练模型

(4)使用模型




猜你喜欢

转载自blog.csdn.net/j754379117/article/details/77623508