Understanding and analysis of resnet_v1_50 source code

Source code link:
https://github.com/tensorflow/models/blob/master/research/slim/nets/resnet_utils.py
https://github.com/tensorflow/models/blob/master/research/slim/nets /resnet_v1.py

1. Usage of resnet_v1_50 in TensorFlow

First, summarize the usage. The parameters of resnet_v1_50 in the source code are as follows:

def resnet_v1_50(inputs,
                 num_classes=None,
                 is_training=True,
                 global_pool=True,
                 output_stride=None,
                 spatial_squeeze=True,
                 store_non_strided_activations=False,
                 min_base_depth=8,
                 depth_multiplier=1,
                 reuse=None,
                 scope='resnet_v1_50'):

in:

  • input: training set, its format is [batch, height_in, width_in, channels]
  • num_classes: The number of types of samples, used to define the number of nodes in the upper layer. If it is "None", the final output should be [batch,1,1,2048], if "spatial_stride=True", the final output is [batch,2048]
  • is_training: Whether to add the "Batch_Norm" layer to the training model
  • global_pool: This layer is located after the entire network structure and before "num_classes". "True" means to do a global average pooling for the output of the last "net" layer of the network. The so-called global pooling means that the pooled stride is equal to the input size, and a scalar is obtained.
  • spatial_squeeze: Remove the dimension equal to 1 in the list, such as spatial_squeeze([B,1,1,C])=[B,C]
  • store_non_strided_activations: useful in multi-scale image processing, you can store outputs of different sizes

To put it simply, after we import the above modules, we have built the network structure of ResNet50, mainly inputting the training set "input" and the number of categories "num_classes". If "num_classes=None", we build a network architecture of a feature extractor, which can only extract the features of the picture, which may be a very high dimension, such as 2048; if "num_classes=10", it means that we will "input "The data in " is divided into 10 categories, and the output of the last layer of the network architecture is a 10-dimensional vector, which can represent the probability that the picture belongs to a certain category.

2. The framework of resnet_v1_50 source code construction

"resnet_utils.py" and "resnet_v1.py" are two modules that build resnet_v1_50 in the source code, and the important functions inside have been circled with black boxes. Block defines a class, 'scope' is the namespace attribute, 'unit_fn' is a function that handles the unit block in the network architecture, and args is its parameter. "stack_blocks_dense" is processing ResNet_Block blocks. "bottleneck" deals with the bottleneck part of the network architecture, including the "shortcut" part of the paper. "resnet_v1" is the main architecture of "ResNet50". In "resnet_v1_block", Block.unit_fn is assigned as bottleneck.

Let's take a look at the meanings of Block, bottleneck and unit in the source code, as shown in Figure 3 below:

Figure 3 ResNet_Block, bottleneck and unit

A "ResNet_Block" represents the conv2_x in Table 1 of the original paper, excluding the max pool layer, which is the part in the blue dotted box in Figure 3; "bottleneck" indicates the content including the black curve in the green dotted box; and "unit" means the 3*1 table in the red dotted box. It is easier to understand by comparing Table1.

The content of "resnet_v1" in the source code is as follows:

def resnet_v1(inputs,
              blocks,
              num_classes=None,
              is_training=True,
              global_pool=True,
              output_stride=None,
              include_root_block=True,
              spatial_squeeze=True,
              store_non_strided_activations=False,
              reuse=None,
              scope=None):
  with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
    end_points_collection = sc.original_name_scope + '_end_points'
    with slim.arg_scope([slim.conv2d, bottleneck,
                         resnet_utils.stack_blocks_dense],
                        outputs_collections=end_points_collection):
      with (slim.arg_scope([slim.batch_norm], is_training=is_training)
            if is_training is not None else NoOpScope()):
        net = inputs
        if include_root_block:
          if output_stride is not None:
            if output_stride % 4 != 0:
              raise ValueError('The output_stride needs to be a multiple of 4.')
            output_stride /= 4
          net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
        net = resnet_utils.stack_blocks_dense(net, blocks, output_stride,
                                              store_non_strided_activations)
        # Convert end_points_collection into a dictionary of end_points.
        end_points = slim.utils.convert_collection_to_dict(
            end_points_collection)

        if global_pool:
          # Global average pooling.
          net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
          end_points['global_pool'] = net
        if num_classes:
          net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                            normalizer_fn=None, scope='logits')
          end_points[sc.name + '/logits'] = net
          if spatial_squeeze:
            net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
            end_points[sc.name + '/spatial_squeeze'] = net
          end_points['predictions'] = slim.softmax(net, scope='predictions')
        return net, end_points
resnet_v1.default_image_size = 224

The content before the statement "net = resnet_utils.stack_blocks_dense(net, blocks, output_stride, store_non_strided_activations)" in the source code deals with the content between the main architecture of ResNet50, including the "conv1" convolution layer in Table1 and the pooling of "conv2_1" layer; and this statement deals with the main architecture of the network - all parts from "conv2_2" to "conv4_x", defined by the parameter "blocks". This statement is followed by "global_pool", "num_classes" and "spatial_squeeze". Only when "num_classes" is not "None", "spatial_squeeze" will take effect.

The calling process of the function represented by Table2, and the actual value of some parameters during the calling process of ResNet50. The black font in the white part represents the formal parameters of the function, while the blue part represents the actual value of the parameter. The formal parameters of "resnet_v1_50()" and "resnet_v1()" are not all listed, but the unlisted parts do not affect the understanding of the source code.

The calling process of the Table2 function

"resnet_v1_block()" is the main structure, which defines 4 "Block" blocks of ResNet50, including the name of the block block, the number of units contained in each block, and the corresponding stride. The depth or base_depth is actually It is the number of channels of the kernel in the network architecture. "resnet_v1_block()" calls "resnet_utils.Block()" during execution to define the unit_fn() function in each block block, which is equal to bottleneck(), and assigns values ​​to some of the parameters. In "conv2_x " is "[{256,64,1},{256,64,1},{256,64,2}]" Each parameter in curly braces corresponds to a unit. The last line in Table2 is actually the call of the function in the penultimate line, which handles the specific things in each "bottleneck()". Output after processing, continue to process the next unit, and finally return to "resnet_v1()" to process the next block. Until the processing of the main structure part is completed, that is, the return value of "net = resnet_utils.stack_blocks_dense(net, blocks, output_stride, store_non_strided_activations)" is obtained, and finally the subsequent part is processed and the output of the entire program is returned.

 

Guess you like

Origin blog.csdn.net/Huang_Fj/article/details/100575180