tensorflow之seq2seq阅读笔记

学习tensorflow最好的方式还是阅读官方文档：https://www.tensorflow.org/versions/r0.12/tutorials/seq2seq/

一、tensorflow的RNN使用：

1.使用LSTM

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
probabilities = []
loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities.append(tf.nn.softmax(logits))
    loss += loss_function(probabilities, target_words)

2.增加反向传播的层数

# Placeholder for the inputs in a given iteration.num_steps可以看作是一句话的单词数量
words = tf.placeholder(tf.int32, [batch_size, num_steps])

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])

for i in range(num_steps):
    # The value of state is updated after processing each batch of words.
    output, state = lstm(words[:, i], state)

    # The rest of the code.
    # ...

final_state = state

3.输入数据

# 在正式调用LSTM之前，需要进行word2vec来进行词编码，embedding_matrix is a tensor of shape [vocabulary_size embedding size]word_ids即单词的索引号
word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)

4.构建多层LSTM，number_of_layers即为构建的LSTM的深度

lstm = rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=False)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * number_of_layers,
    state_is_tuple=False)

initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)
for i in range(num_steps):
    # The value of state is updated after processing each batch of words.
    output, state = stacked_lstm(words[:, i], state)

    # The rest of the code.
    # ...

final_state = state

二、Seq2Seq Models：可以用于翻译，对话，语言生成等场景

1.涉及的文件：

seq2seq.py:构建seq2seq模型的一些库

seq2seq_model.py:seq2seq神经网络模型

data_utils.py:准备训练数据

translate.py:开始训练seq2seq模型

2.seq2seq模型的结构：

基本结构就是两部分，一部分是Encoder输入，另一部分是Decoder产生输出

3.seq2seq库的使用：

outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)

4.模型讲解：

（1）bucketing(分桶)和padding(补齐)

为了高效解决句子长度不一致问题，将句子长度分为了几类

buckets = [(5, 10), (10, 15), (20, 25), (40, 50)]

5.训练模型步骤，以电影剧本对白数据为例：

（1）将数据进行格式化处理，然后进行训练集和测试集划分

（2）构建词典，然后将句子转成word ids

（3）定义超参数开始训练模型

（4）使用模型

tensorflow之seq2seq阅读笔记

猜你喜欢