Language Model perplexity by using tensorflow使用tensorflow RNN模型计算语言模型的困惑度

LM perplexity by using tensorflow

1、Language model perplexity是衡量语言模型好坏的重要指标,其计算公式P(sentence)^-(1/N)

2、tensorflow的RNN模型如何使用
参考API文档:tf.contrib.legacy_seq2seq.sequence_loss_by_example,这个函数会返回一个大小为N的列表,N为句子数目,每个值代表该句话的log-perplexity。

tf.contrib.legacy_seq2seq.sequence_loss_by_example(
    logits,
    targets,
    weights,
    average_across_timesteps=True,
    softmax_loss_function=None,
    name=None
)
#计算这个batch的log-perplexity,返回的是shape=[batch]的值,代表每个句子的log-perplexity
loss = legacy_seq2seq.sequence_loss_by_example([self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.ones([args.batch_size * args.seq_length])],
                args.vocab_size)
#计算这个batch中所有句子的平均log-perplexity          
self.cost = tf.reduce_sum(loss) / args.batch_size
#计算这个batch中所有句子的平均perplexity
self.perplexity = tf.exp(self.cost)

可以看一下tensorflow对sequence_loss_by_example的源码

def sequence_loss_by_example(logits,
                             targets,
                             weights,
                             average_across_timesteps=True,
                             softmax_loss_function=None,
                             name=None):
  """Weighted cross-entropy loss for a sequence of logits (per example).
  Args:
    logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
    targets: List of 1D batch-sized int32 Tensors of the same length as logits.
    weights: List of 1D batch-sized float-Tensors of the same length as logits.
    average_across_timesteps: If set, divide the returned cost by the total
      label weight.
    softmax_loss_function: Function (labels-batch, inputs-batch) -> loss-batch
      to be used instead of the standard softmax (the default if this is None).
    name: Optional name for this operation, default: "sequence_loss_by_example".
  Returns:
    1D batch-sized float Tensor: The log-perplexity for each sequence.
  Raises:
    ValueError: If len(logits) is different from len(targets) or len(weights).
  """
  if len(targets) != len(logits) or len(weights) != len(logits):
    raise ValueError("Lengths of logits, weights, and targets must be the same "
                     "%d, %d, %d." % (len(logits), len(weights), len(targets)))
  with ops.name_scope(name, "sequence_loss_by_example",
                      logits + targets + weights):
    log_perp_list = []
    for logit, target, weight in zip(logits, targets, weights):
      if softmax_loss_function is None:
        # TODO(irving,ebrevdo): This reshape is needed because
        # sequence_loss_by_example is called with scalars sometimes, which
        # violates our general scalar strictness policy.
        target = array_ops.reshape(target, [-1])
        crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
            labels=target, logits=logit)
      else:
        crossent = softmax_loss_function(target, logit)
      log_perp_list.append(crossent * weight)
    log_perps = math_ops.add_n(log_perp_list)
    if average_across_timesteps:
      total_size = math_ops.add_n(weights)
      total_size += 1e-12  # Just to avoid division by 0 for all-0 weights.
      log_perps /= total_size
  return log_perps

计算的过程是:
1、求句子的每个时间点(RNN timestep)处的loss,然后对每个时间点的loss求和。
2、求句子的长度(timestep),然后loss/timestep。(默认需要对average_across_timesteps )

参考链接:
tensorflow的legacy_seq2seq
How to calculate perplexity of RNN in tensorflow
tf.contrib.legacy_seq2seq.sequence_loss_by_example
rnn_ptb_perlexity

猜你喜欢

转载自blog.csdn.net/qq_21460525/article/details/80244032