python代码有一个好处,就是容易编写。但它的坏处也是大大的,好难读啊!!!
以下代码来自FastMaskRCNN(https://github.com/CharlesShang/FastMaskRCNN),在实际运行过程中,把is_training由True改为False后,测试结果大不一样!折腾了几天时间。后来找到了一个解决方法。锁定目标在resnet_v1函数上。
代码内容总揽(ResNet v1模型生成器)
def resnet_v1(inputs,
blocks,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
include_root_block=True,
spatial_squeeze=True,
reuse=None,
scope=None):
"""Generator for v1 ResNet models.
This function generates a family of ResNet v1 models. See the resnet_v1_*()
methods for specific model instantiations, obtained by selecting different
block instantiations that produce ResNets of various depths.
Training for image classification on Imagenet is usually done with [224, 224]
inputs, resulting in [7, 7] feature maps at the output of the last ResNet
block for the ResNets defined in [1] that have nominal stride equal to 32.
However, for dense prediction tasks we advise that one uses inputs with
spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
this case the feature maps at the ResNet output will have spatial shape
[(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
and corners exactly aligned with the input image corners, which greatly
facilitates alignment of the features to the image. Using as input [225, 225]
images results in [8, 8] feature maps at the output of the last ResNet block.
For dense prediction tasks, the ResNet needs to run in fully-convolutional
(FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
have nominal stride equal to 32 and a good choice in FCN mode is to use
output_stride=16 in order to increase the density of the computed features at
small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.
Args:
inputs: A tensor of size [batch, height_in, width_in, channels].
blocks: A list of length equal to the number of ResNet blocks. Each element
is a resnet_utils.Block object describing the units in the block.
num_classes: Number of predicted classes for classification tasks. If None
we return the features before the logit layer.
is_training: whether is training or not.
global_pool: If True, we perform global average pooling before computing the
logits. Set to True for image classification, False for dense prediction.
output_stride: If None, then the output will be computed at the nominal
network stride. If output_stride is not None, it specifies the requested
ratio of input to output spatial resolution.
include_root_block: If True, include the initial convolution followed by
max-pooling, if False excludes it.
spatial_squeeze: if True, logits is of shape [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
If global_pool is False, then height_out and width_out are reduced by a
factor of output_stride compared to the respective height_in and width_in,
else both height_out and width_out equal one. If num_classes is None, then
net is the output of the last ResNet block, potentially after global
average pooling. If num_classes is not None, net contains the pre-softmax
activations.
end_points: A dictionary from components of the network to the corresponding
activation.
Raises:
ValueError: If the target output_stride is not valid.
"""
with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
end_points_collection = sc.name + '_end_points'
with slim.arg_scope([slim.conv2d, bottleneck,
resnet_utils.stack_blocks_dense],
outputs_collections=end_points_collection):
with slim.arg_scope([slim.batch_norm], is_training=True):
net = inputs
if include_root_block:
if output_stride is not None:
if output_stride % 4 != 0:
raise ValueError('The output_stride needs to be a multiple of 4.')
output_stride /= 4
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
if global_pool:
# Global average pooling.
net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
if num_classes is not None:
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, scope='logits')
if spatial_squeeze:
logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
# Convert end_points_collection into a dictionary of end_points.
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
if num_classes is not None:
end_points['predictions'] = slim.softmax(logits, scope='predictions')
return logits, end_points
resnet_v1.default_image_size = 224
想从这个函数着手解决,因此进行分析。但是分析到了一定阶段发现最终定位到了slim的源码中,非我们所能控制,是不是意味着slim源码出现了问题呢?
1、首先看看is_training的控制范围
通过查询资料,layer中的函数,只有batch_norm和dropout会需要到is_training这个参数。resnet_utils也没有用到is_training这个参数控制。
with slim.arg_scope([slim.batch_norm], is_training=True): 这里为layer函数提供了许多默认值, 具体来说是给slim.batch_norm设置了默认值 is_training=True。(http://blog.csdn.net/weixin_35653315/article/details/78160886 参看arg_scope的使用解释)
2、再看看几个重要的函数
#same padding 2-D convolution
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits')
logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
end_points['predictions'] = slim.softmax(logits, scope='predictions')
这些函数都没有太大的关系。
3、但是为什么会出现这种情况呢?
看看dropout的意义:dropout是指在深度学习网络的训练过程中,对于神经网络单元,按照一定的概率将其暂时从网络中丢弃。但是dropout跟这个的关系应该不大。
再看看batch_norm。
#batch_norm函数下面
is_training: Whether or not the layer is in training mode. In training mode
it would accumulate the statistics of the moments into `moving_mean` and
`moving_variance` using an exponential moving average with the given
`decay`. When it is not in training mode then it would use the values of
the `moving_mean` and the `moving_variance`.
以上是在tensorflow源码中找到的。
最后在http://blog.csdn.net/cyiano/article/details/75006883博文后面看到相关叙述,发现描述的状况差不多。感觉是出现一样的问题了。但是这个博客也没讲的太细,感觉需要深入理解下slim的代码了。
with slim.arg_scope([slim.batch_norm], is_training=True)里面的is_training为false,则出现的问题和上述博文中的一样。目前我的解决方案是只把with slim.arg_scope([slim.batch_norm], is_training=True)里面的is_training设置为true,结果是正常的。
但是我的代码内容是测试呀!!!
weird!!!