slim的batch_norm出现的问题

python代码有一个好处，就是容易编写。但它的坏处也是大大的，好难读啊！！！

以下代码来自FastMaskRCNN（https://github.com/CharlesShang/FastMaskRCNN），在实际运行过程中，把is_training由True改为False后，测试结果大不一样！折腾了几天时间。后来找到了一个解决方法。锁定目标在resnet_v1函数上。

代码内容总揽（ResNet v1模型生成器）


def resnet_v1(inputs,
              blocks,
              num_classes=None,
              is_training=True,
              global_pool=True,
              output_stride=None,
              include_root_block=True,
              spatial_squeeze=True,
              reuse=None,
              scope=None):
  """Generator for v1 ResNet models.

  This function generates a family of ResNet v1 models. See the resnet_v1_*()
  methods for specific model instantiations, obtained by selecting different
  block instantiations that produce ResNets of various depths.

  Training for image classification on Imagenet is usually done with [224, 224]
  inputs, resulting in [7, 7] feature maps at the output of the last ResNet
  block for the ResNets defined in [1] that have nominal stride equal to 32.
  However, for dense prediction tasks we advise that one uses inputs with
  spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
  this case the feature maps at the ResNet output will have spatial shape
  [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
  and corners exactly aligned with the input image corners, which greatly
  facilitates alignment of the features to the image. Using as input [225, 225]
  images results in [8, 8] feature maps at the output of the last ResNet block.

  For dense prediction tasks, the ResNet needs to run in fully-convolutional
  (FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
  have nominal stride equal to 32 and a good choice in FCN mode is to use
  output_stride=16 in order to increase the density of the computed features at
  small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.

  Args:
    inputs: A tensor of size [batch, height_in, width_in, channels].
    blocks: A list of length equal to the number of ResNet blocks. Each element
      is a resnet_utils.Block object describing the units in the block.
    num_classes: Number of predicted classes for classification tasks. If None
      we return the features before the logit layer.
    is_training: whether is training or not.
    global_pool: If True, we perform global average pooling before computing the
      logits. Set to True for image classification, False for dense prediction.
    output_stride: If None, then the output will be computed at the nominal
      network stride. If output_stride is not None, it specifies the requested
      ratio of input to output spatial resolution.
    include_root_block: If True, include the initial convolution followed by
      max-pooling, if False excludes it.
    spatial_squeeze: if True, logits is of shape [B, C], if false logits is
        of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.
  Returns:
    net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
      If global_pool is False, then height_out and width_out are reduced by a
      factor of output_stride compared to the respective height_in and width_in,
      else both height_out and width_out equal one. If num_classes is None, then
      net is the output of the last ResNet block, potentially after global
      average pooling. If num_classes is not None, net contains the pre-softmax
      activations.
    end_points: A dictionary from components of the network to the corresponding
      activation.

  Raises:
    ValueError: If the target output_stride is not valid.
  """
  with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
    end_points_collection = sc.name + '_end_points'
    with slim.arg_scope([slim.conv2d, bottleneck,
                         resnet_utils.stack_blocks_dense],
                        outputs_collections=end_points_collection):
      with slim.arg_scope([slim.batch_norm], is_training=True):
        net = inputs
        if include_root_block:
          if output_stride is not None:
            if output_stride % 4 != 0:
              raise ValueError('The output_stride needs to be a multiple of 4.')
            output_stride /= 4
          net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
        net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
        if global_pool:
          # Global average pooling.
          net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
        if num_classes is not None:
          net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                            normalizer_fn=None, scope='logits')
        if spatial_squeeze:
          logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
        # Convert end_points_collection into a dictionary of end_points.
        end_points = slim.utils.convert_collection_to_dict(end_points_collection)
        if num_classes is not None:
          end_points['predictions'] = slim.softmax(logits, scope='predictions')
        return logits, end_points
resnet_v1.default_image_size = 224

想从这个函数着手解决，因此进行分析。但是分析到了一定阶段发现最终定位到了slim的源码中，非我们所能控制，是不是意味着slim源码出现了问题呢？

1、首先看看is_training的控制范围

通过查询资料，layer中的函数，只有batch_norm和dropout会需要到is_training这个参数。resnet_utils也没有用到is_training这个参数控制。

with slim.arg_scope([slim.batch_norm], is_training=True): 这里为layer函数提供了许多默认值，具体来说是给slim.batch_norm设置了默认值 is_training=True。（http://blog.csdn.net/weixin_35653315/article/details/78160886 参看arg_scope的使用解释）

2、再看看几个重要的函数

#same padding 2-D convolution
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')

net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits')
logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
end_points['predictions'] = slim.softmax(logits, scope='predictions')

这些函数都没有太大的关系。

3、但是为什么会出现这种情况呢？

看看dropout的意义：dropout是指在深度学习网络的训练过程中,对于神经网络单元,按照一定的概率将其暂时从网络中丢弃。但是dropout跟这个的关系应该不大。

扫描二维码关注公众号，回复： 3242611 查看本文章

再看看batch_norm。

#batch_norm函数下面
    is_training: Whether or not the layer is in training mode. In training mode
      it would accumulate the statistics of the moments into `moving_mean` and
      `moving_variance` using an exponential moving average with the given
      `decay`. When it is not in training mode then it would use the values of
      the `moving_mean` and the `moving_variance`.

以上是在tensorflow源码中找到的。

最后在http://blog.csdn.net/cyiano/article/details/75006883博文后面看到相关叙述，发现描述的状况差不多。感觉是出现一样的问题了。但是这个博客也没讲的太细，感觉需要深入理解下slim的代码了。

with slim.arg_scope([slim.batch_norm], is_training=True)里面的is_training为false，则出现的问题和上述博文中的一样。目前我的解决方案是只把with slim.arg_scope([slim.batch_norm], is_training=True)里面的is_training设置为true，结果是正常的。

但是我的代码内容是测试呀！！！

weird！！！

slim的batch_norm出现的问题

猜你喜欢