tensorflow踩坑记之seq2seq

每次debug，都会发出灵魂拷问，“为什么我和别人的代码写的一模一样，但我的就是不对呢？”，

每当此时，我的脑海中就会循环播放ykr童鞋的一句话“你以为一样，其实就是不一样，也许是输入格式就错了，也许是API换版本了，也许……”，

是的，没错，这次就是输入和人家不一样，还自以为一模一样。

问题描述：

采用encoder-decoder框架实现基于关键词的文本生成，在训练阶段，decoder借助了三板斧“TrainingHelper+BasicDecoder+dynamic_decode”来实现RNN，代码如下

helper = tf.contrib.seq2seq.TrainingHelper(此处省略参数)
decoder = tf.contrib.seq2seq.BasicDecoder(此处省略参数)
final_outputs, final_state, final_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(此处省略参数)

logits = final_outputs.rnn_output(此处省略参数)

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.reshape(output_out, [-1]), 
                                  logits=tf.reshape(logits + 1e-10, [-1, self.vocab_size]))

label_weights = tf.sequence_mask(此处省略参数)
label_weights = tf.reshape(label_weights, [-1])
cost = tf.reduce_mean(loss * label_weights)

报错信息：

InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [768,307102] and labels shape [816]

其中labels是真实值，logits是预测值，即final_outputs.rnn_output

定位问题代码：

问题出现在loss的计算时，labels的维度是[batch_size*max_seq_size],而logits的维度为[可变seq_size*batch_size,vocab_size],因此出现了第一个维度不匹配而无法计算loss的情况。为什么logits得到的是可变的呢？

发现原因：

原来当 decoder 解码到 sequence_length 时，循环会停止；另一方面，因为一个 batch 中的长度不都相同，所以得到的 dynamic_length 应该是某个 batch 中最长的一句的长度。具体解释参考了博客logits维度的解释

正确写法：

既然知道了原因，那么该情况下改正方法有2种：

第一种：让labels的维度与logits一样。

# 获取logits 的长度，即max_len 和 logits 的较小者
current_ts = tf.to_int32(tf.minimum(tf.shape(output_out)[1], tf.shape(logits)[1]))
# 对 output_out 进行截取
output_out = tf.slice(output_out, begin=[0, 0], size=[-1, current_ts])

其中output_out存储的是真实词的index

第二种：让logits的维度恒等于[batch_size*max_seq_size，vocab_size]。

sequence_length设置为max_seq_size，注意直接写为sequence_length=max_seq_size是不可以的，sequence_length参数需要的是vector，分量表示batch中某个句子的长度。显然这样做会让decoder多解码几步，运行时间会变长，但因为 tf.sequence_mask的作用，不影响loss的结果。

思考与总结：

为什么别人的代码不会报错呢？那是因为输入格式不相同

我的输入是这样得到的：

class TFRecordReader:
    def __init__(self, corpus):

        self.batch_size = pt.batch_size
        self.item = (tf.data.TFRecordDataset(corpus)
                     # .repeat(20)
                     .shuffle(buffer_size=pt.batch_size*10)
                     .map(self._decode_record)
                     .batch(self.batch_size)
                     .make_one_shot_iterator()
                     .get_next())

    def _decode_record(self, example):
        """Decodes a record to a TensorFlow example."""
        feeds = tf.parse_single_example(example, features={
            'input': tf.FixedLenFeature([pt.max_input_len], tf.int64),
            'input_length': tf.FixedLenFeature([], tf.int64),
            'output': tf.FixedLenFeature([pt.max_output_len], tf.int64),
            'output_length': tf.FixedLenFeature([], tf.int64)
        })
        keywords = feeds['input']
        keywords_length = feeds['input_length']  # 关键词的真实长度
        output = feeds['output']  # 这是padding过的, 比如<sos>ABC<eos>PP
        output_in = output[:-1]  # 带padding,<sos>ABC<eos>P
        output_out = output[1:]  # 带padding，ABC<eos>PP
        output_length = tf.subtract(feeds['output_length'], 1)  # 输出句子的真实长度-1，为5-1=4
        return keywords, keywords_length, output_in, output_out, output_length

别人的输入是这样的：

class TFRecordReader:
    def __init__(self, corpus):

        self.batch_size = pt.batch_size
        self.item = (tf.data.TFRecordDataset(corpus)
                     # .repeat(20)
                     .shuffle(buffer_size=pt.batch_size*10)
                     .map(self._decode_record)
                     .padded_batch(batch_size=self.batch_size,
                                  padded_shapes=([None], [], [None], [None], [])) 
                     .make_one_shot_iterator()
                     .get_next())

    def _decode_record(self, example):
        """Decodes a record to a TensorFlow example."""
        feeds = tf.parse_single_example(example, features={
            'input': tf.FixedLenFeature([pt.max_input_len], tf.int64),
            'input_length': tf.FixedLenFeature([], tf.int64),
            'output': tf.VarLenFeature(tf.int64),
            'output_length': tf.FixedLenFeature([], tf.int64)
        })
        keywords = feeds['input']
        keywords_length = feeds['input_length']  # 关键词的真实长度
        output = tf.sparse_tensor_to_dense(feeds['output'])  # 这是未padding过的
        output_in = output[:-1]  # 不带padding
        output_out = output[1:]  # 不带padding
        output_length = tf.shape(output_in )[0]  # 输出句子的真实长度
        return keywords, keywords_length, output_in, output_out, output_length

发现区别了么？别人存储的是变长数据，padded_batch中设置为None，表示将句子padding到该batch中最长句子的长度，所以output_out 的大小就是该batch中最长句子的长度。而decoder 解码得到的 logits的维度也是[某个 batch 中最长的一句的长度*batch_size,vocab_size]，两者正好对应，所以不报错。

gbl5555

发布了34 篇原创文章 · 获赞 20 · 访问量 6万+

私信关注

tensorflow踩坑记之seq2seq

猜你喜欢