How TensorFlow1.x uses the official pre-training model


Environment: tensorflow1.13
Model: use vgg19 as an example

Note: The results of this document are run in CPU mode, because the graphics card is 30 series and the system is win10, the GPU mode cannot be used in tf1.13 version. Therefore, if the results obtained by readers using the GPU mode are slightly different from this document, it should be a normal phenomenon.

The document background is to use the pre-trained vgg model to calculate vgg loss in tensorflow1.x.

Model download

The official pre-training model is in tensorflow's model warehouse, the full path is tensorflow/models/research/slim, please pay attention to select the tf1.13 branch:
https://github.com/tensorflow/models/tree/r1.13.0/research/slim

The vgg19 model download address:
http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz
After decompression, the file name is vgg_19.ckpt.

helper function

In the following code, in order to compare whether different model usage methods bring consistent results, a helper function for visual feature map is written to facilitate comparison. The following code will be used directly without repeating it. If readers want to try the code in this document, please paste it by themselves.

def visualize_feature_map(feature_map,
                          col_nums=None,
                          gap_value=0.5,
                          gap_width=10,
                          gap_height=10):
    """
    Visualize feature map in one image.

    Parameters
    ----------
    feature_map: numpy array, shape is (height, width, channel)
    col_nums: number of feature map columns
    gap_value: value for feature map gap
    gap_width: width of gap
    gap_height: height of gap

    Returns
    -------
    image: image to show feature map
    """
    eps = 1e-6

    if feature_map.ndim == 4:
        if feature_map.shape[0] == 1:
            feature_map = np.squeeze(feature_map)
        else:
            raise ValueError("feature map must be 3 dims ndarray (height, "
                             "width, channel) or 4 dims ndarray whose shape "
                             "must be (1, height, width, channel)")

    # compute col_nums (if not set) and row_nums
    height, width, channel = feature_map.shape
    if col_nums is None:
        col_nums = int(round(np.sqrt(channel)))
    row_nums = int(np.ceil(channel / col_nums))

    # compute final image width and height
    image_width = col_nums * (width + gap_width) - gap_width
    image_height = row_nums * (height + gap_height) - gap_height

    image = np.ones(shape=(image_height, image_width),
                    dtype=feature_map.dtype) * gap_value
    cnt = 0
    while cnt < channel:
        row = cnt // col_nums
        col = cnt % col_nums

        row_beg = row * (height + gap_height)
        row_end = row_beg + height
        col_beg = col * (width + gap_width)
        col_end = col_beg + width

        image[row_beg:row_end, col_beg:col_end] = \
            feature_map[:, :, cnt] / (np.std(feature_map[:, :, cnt]) + eps)
        cnt += 1

    return image

model use

There are three typical ways to use official models:

  1. Use the official model file to load the model.
    You need to first define the model in the calculation graph using placeholder, and then use the restore method of tf.train.Saver() to load the model parameters. This method requires that the node name and parameter name of our newly defined model must be vgg_19.ckptconsistent with those saved in , which is why it is recommended to use the official model definition file directly.

  2. Mokai's official model file.
    In the convolution part, the official model file only provides the feature map of the relu layer. Sometimes we may need the feature map of the conv layer. At this time, magic modification is required. The modified node name and parameter name must still be vgg_19.ckptconsistent with those saved in .

  3. Use NewCheckpointReader and customize the model
    to pywrap_tensorflow.NewCheckpointReader(model_path)read the weight parameters, and then redefine the model structure and assign the weight parameters to the past. In this way, the model structure and node names can be flexibly defined, but the code is cumbersome to write. (The officially defined model file can only get the feature map of the relu layer, but not the conv layer, so the flexibility is not good)

1. Use the official model file to load the model

Official model definition file path (URL):
https://github.com/tensorflow/models/blob/r1.13.0/research/slim/nets/vgg.py
The definition of vgg19 is found as follows:

def vgg_19(inputs,
           num_classes=1000,
           is_training=True,
           dropout_keep_prob=0.5,
           spatial_squeeze=True,
           scope='vgg_19',
           fc_conv_padding='VALID',
           global_pool=False):
    """
    Oxford Net VGG 19-Layers version E Example.
    Note: All the fully_connected layers have been transformed to conv2d
    layers. To use in classification mode, resize input to 224x224.

    Args:
        inputs: a tensor of size [batch_size, height, width, channels].
        num_classes: number of predicted classes. If 0 or None, the logits
            layer is omitted and the input features to the logits layer are
            returned instead.
        is_training: whether or not the model is being trained.
        dropout_keep_prob: the probability that activations are kept in the
            dropout layers during training.
        spatial_squeeze: whether or not should squeeze the spatial dimensions
            of the outputs. Useful to remove unnecessary dimensions for
            classification.
        scope: Optional scope for the variables.
        fc_conv_padding: the type of padding to use for the fully connected
            layer that is implemented as a convolutional layer. Use 'SAME'
            padding if you are applying the network in a fully convolutional
            manner and want to get a prediction map downsampled by a factor of
            32 as an output. Otherwise, the output prediction map will be
            (input / 32) - 6 in case of 'VALID' padding.
        global_pool: Optional boolean flag. If True, the input to the
            classification layer is avgpooled to size 1x1, for any input size.
            (This is not part of the original VGG architecture.)
    Returns:
        net: the output of the logits layer (if num_classes is a non-zero
            integer), or the non-dropped-out input to the logits layer (if
            num_classes is 0 or None).
        end_points: a dict of tensors with intermediate activations.
    """
    with tf.variable_scope(scope, 'vgg_19', [inputs]) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'
        # Collect outputs for conv2d, fully_connected and max_pool2d.
        with slim.arg_scope(
                [slim.conv2d, slim.fully_connected, slim.max_pool2d],
                outputs_collections=end_points_collection):
            net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3],
                              scope='conv1')
            net = slim.max_pool2d(net, [2, 2], scope='pool1')
            net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
            net = slim.max_pool2d(net, [2, 2], scope='pool2')
            net = slim.repeat(net, 4, slim.conv2d, 256, [3, 3], scope='conv3')
            net = slim.max_pool2d(net, [2, 2], scope='pool3')
            net = slim.repeat(net, 4, slim.conv2d, 512, [3, 3], scope='conv4')
            net = slim.max_pool2d(net, [2, 2], scope='pool4')
            net = slim.repeat(net, 4, slim.conv2d, 512, [3, 3], scope='conv5')
            net = slim.max_pool2d(net, [2, 2], scope='pool5')

            # Use conv2d instead of fully_connected layers.
            net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding,
                              scope='fc6')
            net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                               scope='dropout6')
            net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
            # Convert end_points_collection into a end_point dict.
            end_points = slim.utils.convert_collection_to_dict(
                end_points_collection)
            if global_pool:
                net = tf.reduce_mean(net, [1, 2], keep_dims=True,
                                     name='global_pool')
                end_points['global_pool'] = net
            if num_classes:
                net = slim.dropout(net, dropout_keep_prob,
                                   is_training=is_training,
                                   scope='dropout7')
                net = slim.conv2d(net, num_classes, [1, 1],
                                  activation_fn=None,
                                  normalizer_fn=None,
                                  scope='fc8')
                if spatial_squeeze:
                    net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
                end_points[sc.name + '/fc8'] = net
            return net, end_points

Regarding the above model definition, there are two functions that need a little explanation:

  • The definition of slim.conv2d
    slim.conv2dwas found to be wrong after being chased in by ctrl and click in Pycharm. The real definition is in the file in the following path:
    D:\Program\anaconda3\envs\tf13\Lib\site-packages\tensorflow\ contrib\layers\python\layers\layers.py
    (Please note that D:\Program\anaconda3\envs\tf13 is the environment path of tf1.13 on my computer, which needs to be modified according to your own environment)

    Among them, line 1117 def convolution2dis the function definition and implementation, and line 3327 conv2d = convolution2dhas a short name. convolution2dThere are in the parameter list activation_fn=nn.relu, so this convolution has relu as the activation function by default.

  • The function of slim.repeat
    is to repeat an operator n times. The function implementation is in the same file as slim.conv2d, and the function definitions and partial explanations in the file are as follows:

    def repeat(inputs, repetitions, layer, *args, **kwargs):
      """Applies the same layer with the same arguments repeatedly.
    
    y = repeat(x, 3, conv2d, 64, [3, 3], scope='conv1')
    # It is equivalent to:
    
    x = conv2d(x, 64, [3, 3], scope='conv1/conv1_1')
    x = conv2d(x, 64, [3, 3], scope='conv1/conv1_2')
    y = conv2d(x, 64, [3, 3], scope='conv1/conv1_3')
    
    ......
    
    """
    

Use the script as follows. Note that the two functions vgg_19and visualize_feature_maptwo functions have already appeared in this document and are relatively long, so they are omitted in the following script:

# -*- coding: utf-8 -*-
import os
import cv2
import tensorflow as tf
import numpy as np

os.environ['CUDA_VISIBLE_DEVICES'] = "/gpu:0"
slim = tf.contrib.slim


def vgg_19(inputs,
           num_classes=1000,
           is_training=True,
           dropout_keep_prob=0.5,
           spatial_squeeze=True,
           scope='vgg_19',
           fc_conv_padding='VALID',
           global_pool=False):
    # 见本文档前面
    pass


def visualize_feature_map(feature_map,
                          col_nums=None,
                          gap_value=0.5,
                          gap_width=10,
                          gap_height=10):
    # 见本文档前面
    pass


def main():
    image_file = r'E:\images\lena512color.tiff'
    model_path = r'E:\pretrained_model\tf1x\vgg_19.ckpt'
    inputs_ = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
    outputs, feature_map_dict = vgg_19(inputs_,
                                       num_classes=0,
                                       is_training=False,
                                       global_pool=True)

    # print trainable variables
    for var in tf.trainable_variables():
        print(var)

    # load pretrained model
    saver = tf.train.Saver()
    sess = tf.Session()
    saver.restore(sess, model_path)

    # running test
    inputs = cv2.imread(image_file)
    inputs = np.expand_dims(inputs, axis=0)
    out, feature_maps = sess.run([outputs, feature_map_dict],
                                 feed_dict={
    
    
                                     inputs_: inputs,
                                 })

    # print shape of feature maps
    for key in feature_maps.keys():
        print(key, feature_maps.get(key).shape)

    feature_map = feature_maps.get('vgg_19/conv3/conv3_4')
    feature_map = np.squeeze(feature_map)
    image = visualize_feature_map(feature_map)
    image = np.clip(image * 255, 0, 255).astype(np.uint8)
    # cv2.imwrite('lena_feature_map_vgg_conv3_4.png', image)

    # print statistics for feature map
    for i in range(5):
        mean_val = np.mean(feature_map[:, :, i])
        std = np.std(feature_map[:, :, i])
        min_val = np.min(feature_map[:, :, i])
        max_val = np.max(feature_map[:, :, i])
        print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
            min_val, max_val, mean_val, std))

    # print part of final global feature vector
    feature_vec = feature_maps.get('global_pool')
    feature_vec = np.squeeze(feature_vec)
    for i in range(10):
        print(feature_vec[i])


if __name__ == '__main__':
    main()

The above script needs to be explained as follows:

  1. The parameter setting of vgg_19 needs more attention.
    If you are just doing inference, is_training must be set to False;
    the purpose of my use is to calculate vgg loss, so the full connection part is not needed, so in order not to report an error in the full connection part, set num_classes to 0, and set global_pool to True.

  2. The weights can only be restored after the calculation graph is created using placeholder. The script prints the weight variables, including name, shape, and dtype,
    in the middle part (section) of the calculation graph and restore weights .# print trainable variables

    <tf.Variable 'vgg_19/conv1/conv1_1/weights:0' shape=(3, 3, 3, 64) dtype=float32_ref>
    <tf.Variable 'vgg_19/conv1/conv1_1/biases:0' shape=(64,) dtype=float32_ref>
    <tf.Variable 'vgg_19/conv1/conv1_2/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
    <tf.Variable 'vgg_19/conv1/conv1_2/biases:0' shape=(64,) dtype=float32_ref>
    <tf.Variable 'vgg_19/conv2/conv2_1/weights:0' shape=(3, 3, 64, 128) dtype=float32_ref>
    <tf.Variable 'vgg_19/conv2/conv2_1/biases:0' shape=(128,) dtype=float32_ref>
    ......
    
  3. vgg_19There are two outputs.
    The first one is easy to understand and is the output of network inference, but it is useless for calculating vgg loss.
    The second output saves the feature map of the network in the form of dict. The key of the dict is the node name of the feature map, and the value is the value of the feature map. This is what is really needed to calculate the vgg loss. # print shape of feature mapsThe name and shape of the feature map are printed in part, and it is also conv3_4drawn on a picture for some simple and intuitive tests and inspections.

    vgg_19/conv1/conv1_1 (1, 512, 512, 64)
    vgg_19/conv1/conv1_2 (1, 512, 512, 64)
    vgg_19/pool1 (1, 256, 256, 64)
    vgg_19/conv2/conv2_1 (1, 256, 256, 128)
    vgg_19/conv2/conv2_2 (1, 256, 256, 128)
    vgg_19/pool2 (1, 128, 128, 128)
    ......
    
  4. Print some statistical values ​​of feature map, you can check and confirm the following facts:
    feature map has only relu, no conv, because the minimum value of feature map is 0.0;
    vgg appears before BN, so there is no BN in the network, resulting in the value of feature map It is very large (if there is BN, the value will generally not exceed 5), so when calculating vgg loss, it is generally necessary to multiply a small weight coefficient according to the specific situation.

    1  min=0.0000, max=9201.5811, mean=386.3745, std=737.3252
    2  min=0.0000, max=7389.5913, mean=1412.0540, std=616.6437
    3  min=0.0000, max=3323.7239, mean=400.2662, std=522.4063
    4  min=0.0000, max=4319.3765, mean=369.9904, std=644.4222
    5  min=0.0000, max=8997.2305, mean=905.1512, std=1288.8953
    ......
    
  5. Print a part of the final feature vector to check the correctness after modifying the model definition function:

    0.00055606366
    0.0
    0.0
    0.15579844
    0.0
    1.0548652
    0.0
    0.0
    0.05207316
    0.29752082
    ......
    

2. Mokai official model file

The modified model and test code are as follows, which also visualize_feature_mapneed to be pasted from above:

# -*- coding: utf-8 -*-
import os
import cv2
import tensorflow as tf
import numpy as np

slim = tf.contrib.slim


def vgg19(inputs,
          num_classes=1000,
          is_training=True,
          dropout_keep_prob=0.5,
          spatial_squeeze=True,
          scope='vgg_19',
          fc_conv_padding='VALID',
          global_pool=False):
    with tf.variable_scope(scope, 'vgg_19', [inputs]) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'
        # Collect outputs for conv2d, fully_connected and max_pool2d.
        with slim.arg_scope(
                [slim.conv2d, slim.fully_connected, slim.max_pool2d],
                outputs_collections=end_points_collection):
            # conv blocks are modified as follows
            net_config = [
                [64, 2],
                [128, 2],
                [256, 4],
                [512, 4],
                [512, 4],
            ]  # [filters, blocks]

            net = inputs
            relu_dict = {
    
    }
            for i, config in enumerate(net_config):
                filters = config[0]
                for j in range(config[1]):
                    conv_scope = 'conv%d/conv%d_%d' % (i + 1, i + 1, j + 1)
                    relu_name = 'conv%d/relu%d_%d' % (i + 1, i + 1, j + 1)
                    net = slim.conv2d(net, filters, [3, 3],
                                      activation_fn=None,
                                      scope=conv_scope)
                    net = tf.nn.relu(net, name=relu_name)
                    relu_dict[net.op.name] = net
                net = slim.max_pool2d(net, [2, 2], scope='pool%d' % (i + 1))

            # Use conv2d instead of fully_connected layers.
            net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding,
                              scope='fc6')
            net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                               scope='dropout6')
            net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
            # Convert end_points_collection into a end_point dict.
            end_points = slim.utils.convert_collection_to_dict(
                end_points_collection)
            if global_pool:
                net = tf.reduce_mean(net, [1, 2], keep_dims=True,
                                     name='global_pool')
                end_points['global_pool'] = net
            if num_classes:
                net = slim.dropout(net, dropout_keep_prob,
                                   is_training=is_training,
                                   scope='dropout7')
                net = slim.conv2d(net, num_classes, [1, 1],
                                  activation_fn=None,
                                  normalizer_fn=None,
                                  scope='fc8')
                if spatial_squeeze:
                    net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
                end_points[sc.name + '/fc8'] = net

            end_points.update(relu_dict)
            return net, end_points


def visualize_feature_map(feature_map,
                          col_nums=None,
                          gap_value=0.5,
                          gap_width=10,
                          gap_height=10):
    # 见本文档前面
    pass


def main():
    image_file = r'D:\data\test_images\lena512color.tiff'
    model_path = r'E:\pretrained_model\tensorflow1.13\vgg_19.ckpt'
    inputs_ = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
    outputs, feature_map_dict = vgg19(inputs_,
                                      num_classes=0,
                                      is_training=False,
                                      global_pool=True)

    # check trainable variables
    for var in tf.trainable_variables():
        print(var)

    # load pretrained model
    saver = tf.train.Saver()
    sess = tf.Session()
    saver.restore(sess, model_path)

    # running test
    inputs = cv2.imread(image_file)
    inputs = np.expand_dims(inputs, axis=0)
    out, feature_maps = sess.run([outputs, feature_map_dict],
                                 feed_dict={
    
    
                                     inputs_: inputs,
                                 })

    # print shape of feature maps
    for key in feature_maps.keys():
        print(key, feature_maps.get(key).shape)

    feature_map = feature_maps.get('vgg_19/conv3/relu3_4')
    feature_map = np.squeeze(feature_map)
    image = visualize_feature_map(feature_map)
    image = np.clip(image * 255, 0, 255).astype(np.uint8)
    cv2.imwrite('lena_feature_map_vgg_relu3_4--2.png', image)

    # print statistics for relu3_4
    for i in range(5):
        mean_val = np.mean(feature_map[:, :, i])
        std = np.std(feature_map[:, :, i])
        min_val = np.min(feature_map[:, :, i])
        max_val = np.max(feature_map[:, :, i])
        print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
            min_val, max_val, mean_val, std))

    # print statistics for conv3_4
    print('\n')
    feature_map = feature_maps.get('vgg_19/conv3/conv3_4')
    feature_map = np.squeeze(feature_map)
    for i in range(5):
        mean_val = np.mean(feature_map[:, :, i])
        std = np.std(feature_map[:, :, i])
        min_val = np.min(feature_map[:, :, i])
        max_val = np.max(feature_map[:, :, i])
        print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
            min_val, max_val, mean_val, std))

    feature_vec = feature_maps.get('global_pool')
    feature_vec = np.squeeze(feature_vec)
    for i in range(10):
        print(feature_vec[i])


if __name__ == '__main__':
    main()

described as follows:

  1. The purpose of changing the model structure: to separate conv and relu, so that the result of the conv layer can be used as the input of vgg loss.

  2. The key points of changing the model definition function: ensure that the structure cannot be changed; ensure that the node name cannot be changed; since the original code cannot directly collect the feature map of the relu layer, it is necessary to define a dict for collection.

  3. The feature map statistics of conv3_4 are as follows. It can be seen that the min value has already appeared negative, so the separation from the relu layer has indeed been achieved.

    1  min=-2909.7209, max=9201.5811, mean=92.2542, std=991.5225
    2  min=-431.0446, max=7389.5913, mean=1411.6982, std=617.5237
    3  min=-1092.2075, max=3323.7239, mean=339.5828, std=582.6731
    4  min=-2396.4478, max=4319.3765, mean=106.6278, std=852.0536
    5  min=-3547.4551, max=8997.2305, mean=699.2141, std=1488.1344
    
  4. The content in feature_maps has more relu parts, because it is updated at the end of the code, so this part is at the end of feature_maps:

    ......
    vgg_19/conv1/relu1_1 (1, 512, 512, 64)
    vgg_19/conv1/relu1_2 (1, 512, 512, 64)
    vgg_19/conv2/relu2_1 (1, 256, 256, 128)
    vgg_19/conv2/relu2_2 (1, 256, 256, 128)
    vgg_19/conv3/relu3_1 (1, 128, 128, 256)
    vgg_19/conv3/relu3_2 (1, 128, 128, 256)
    vgg_19/conv3/relu3_3 (1, 128, 128, 256)
    vgg_19/conv3/relu3_4 (1, 128, 128, 256)
    ......
    
  5. Other output variables have been checked, and there is no difference with the first method, indicating that the result of the magic modification is correct.

3. Use NewCheckpointReader and customize the model

This method is described in two parts.
The first part simply explains how to get the weight parameters from the pre-trained model; the second part details how to assign the pre-trained weight coefficients to the newly defined model and test it.

Here is the code for the first part:

# -*- coding: utf-8 -*-
from tensorflow.python import pywrap_tensorflow as wrap


def main():
    model_path = r'E:\pretrained_model\tf1x\vgg_19.ckpt'
    reader = wrap.NewCheckpointReader(model_path)

    variables_shape = reader.get_variable_to_shape_map()
    variables_dtype = reader.get_variable_to_dtype_map()
    for key in variables_shape.keys():
        print(key, variables_shape.get(key), variables_dtype.get(key))

    print('\n')
    print(reader.has_tensor("vgg_19/mean_rgb"))
    rgb_mean = reader.get_tensor("vgg_19/mean_rgb")
    print(rgb_mean)


if __name__ == '__main__':
    main()

There are a few caveats to the above code:

  1. NewCheckpointReader is used to load the weight of the pre-trained model
  2. get_variable_to_shape_map() and get_variable_to_dtype_map() can view the shape and dtype of weight parameters
  3. get_tensor() can get the weight parameters and return numpyan array

The printed results are as follows, in which there are two clever parameters, global_stepand vgg_19/mean_rgb, mean_rgb is printed with specific values:

global_step [] <dtype: 'int64'>
vgg_19/conv2/conv2_2/biases [128] <dtype: 'float32'>
vgg_19/conv2/conv2_2/weights [3, 3, 128, 128] <dtype: 'float32'>
vgg_19/conv1/conv1_1/biases [64] <dtype: 'float32'>
vgg_19/conv1/conv1_1/weights [3, 3, 3, 64] <dtype: 'float32'>
vgg_19/conv1/conv1_2/biases [64] <dtype: 'float32'>
vgg_19/conv1/conv1_2/weights [3, 3, 64, 64] <dtype: 'float32'>
......
vgg_19/mean_rgb [3] <dtype: 'float32'>
......
vgg_19/fc8/weights [1, 1, 4096, 1000] <dtype: 'float32'>


[123.68 116.78 103.94]

Here is the code for the second part:

# -*- coding: utf-8 -*-
import os
import cv2
import tensorflow as tf
import numpy as np
from tensorflow.python import pywrap_tensorflow as wrap

os.environ['CUDA_VISIBLE_DEVICES'] = "/gpu:0"
slim = tf.contrib.slim


def vgg19(inputs,
          scope_name='vgg_19'):
    with tf.variable_scope(scope_name):

        net_config = [
            [64, 2],
            [128, 2],
            [256, 4],
            [512, 4],
            [512, 4],
        ]  # [filters, blocks]

        feature_maps = {
    
    }
        x = inputs
        for i, config in enumerate(net_config):
            filters = config[0]
            for j in range(config[1]):
                conv_name = 'conv%d_%d' % (i + 1, j + 1)
                relu_name = 'relu%d_%d' % (i + 1, j + 1)

                x = tf.layers.conv2d(x, filters, [3, 3],
                                     padding='same',
                                     name=conv_name)
                feat_map_name = x.op.name.replace('/BiasAdd', '')
                feature_maps[feat_map_name] = x

                x = tf.nn.relu(x, name=relu_name)
                feature_maps[x.op.name] = x

            x = tf.layers.max_pooling2d(x, (2, 2), (2, 2),
                                        name='pool%d' % (i + 1))
            feat_map_name = x.op.name.replace('/MaxPool', '')
            feature_maps[feat_map_name] = x

        return x, feature_maps


def visualize_feature_map(feature_map,
                          col_nums=None,
                          gap_value=0.5,
                          gap_width=10,
                          gap_height=10):
    # 见本文档前面
    pass


def _get_pretrained_tensor_name(name):
    block_num = int(name.split('/')[1][4:].split('_')[0])
    name = name.replace('vgg_19', 'vgg_19/conv%d' % block_num)
    name = name.replace('kernel', 'weights').replace('bias', 'biases')
    return name


def main():
    image_file = r'E:\images\lena512color.tiff'
    model_path = r'E:\pretrained_model\tf1x\vgg_19.ckpt'
    inputs_ = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
    outputs, feature_map_dict = vgg19(inputs_)
    trainable_vars = tf.trainable_variables()

    # use NewCheckpointReader to get weights
    reader = wrap.NewCheckpointReader(model_path)

    sess = tf.Session()
    sess.run(tf.global_variables_initializer())

    # print trainable variables before assignment
    for var in trainable_vars:
        print(var)
        print(sess.run(var)[:, :, 0, 0])
        break

    # trainable variables assignment
    print('\n')
    for i, var in enumerate(trainable_vars):
        name = _get_pretrained_tensor_name(var.op.name)
        sess.run(var.assign(reader.get_tensor(name)))

    # print trainable variables after assignment
    for var in trainable_vars:
        print(var)
        name = _get_pretrained_tensor_name(var.op.name)
        print(sess.run(var)[:, :, 0, 0])
        print('pretrained weight:')
        print(reader.get_tensor(name)[:, :, 0, 0])
        break

    # test case
    inputs = cv2.imread(image_file)
    inputs = np.expand_dims(inputs, axis=0)
    out, feature_maps = sess.run([outputs, feature_map_dict],
                                 feed_dict={
    
    
                                     inputs_: inputs,
                                 })

    # print shape of feature maps
    print('\n')
    for key in feature_maps.keys():
        print(key, feature_maps.get(key).shape)

    feature_map = feature_maps.get('vgg_19/relu3_4')
    feature_map = np.squeeze(feature_map)
    image = visualize_feature_map(feature_map)
    image = np.clip(image * 255, 0, 255).astype(np.uint8)
    cv2.imwrite('lena_feature_map_vgg_conv3_4--2.png', image)

    # print statistics for feature map
    print('\n')
    for i in range(5):
        mean_val = np.mean(feature_map[:, :, i])
        std = np.std(feature_map[:, :, i])
        min_val = np.min(feature_map[:, :, i])
        max_val = np.max(feature_map[:, :, i])
        print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
            min_val, max_val, mean_val, std))


if __name__ == '__main__':
    main()

For example, for the requirement of vgg loss, we usually don’t need the last fully connected layer, so for the purpose of saving computing power and video memory, the above newly defined model removes the fully connected part, and the name of feature map / variables is also re-named defined, in this case, the restore function cannot be used to load the pre-training parameters, only the way of assignment can be used.

The above code is divided into two parts as a whole, one is the weight parameter assignment, and the other is the same test case as before.

The process of assignment is described below:

  1. Use placeholder to create calculation graph and gettrainable_vars
  2. Use NewCheckpointReaderthe weight parameters of the loaded pre-trained model
  3. Create a Session and initialize global variables
  4. Use var.assign()the method to assign values ​​to the weight parameters

The result printed by the above code is as follows:

<tf.Variable 'vgg_19/conv1_1/kernel:0' shape=(3, 3, 3, 64) dtype=float32_ref>
[[ 0.04975817 -0.0374901  -0.04425776]
 [ 0.03555809  0.08642714  0.05649987]
 [-0.07783681 -0.03184588 -0.07609541]]
(sess.run(tf.global_variables_initializer())之后打印了kernel的一部分,为随机初始化的结果)

<tf.Variable 'vgg_19/conv1_1/kernel:0' shape=(3, 3, 3, 64) dtype=float32_ref>
[[ 0.39416704  0.37740308 -0.04594866]
 [ 0.2671299   0.09986369 -0.34100872]
 [-0.07573577 -0.2803425  -0.41602272]]
pretrained weight:
[[ 0.39416704  0.37740308 -0.04594866]
 [ 0.2671299   0.09986369 -0.34100872]
 [-0.07573577 -0.2803425  -0.41602272]]
 (权重参数赋值之后,有一次打印了kernel的一部分,同时也打印了预训练模型中对应的部分,可以看到kernel被成功赋值)


vgg_19/conv1_1 (1, 512, 512, 64)
vgg_19/relu1_1 (1, 512, 512, 64)
vgg_19/conv1_2 (1, 512, 512, 64)
vgg_19/relu1_2 (1, 512, 512, 64)
vgg_19/pool1 (1, 256, 256, 64)
......
vgg_19/conv5_4 (1, 32, 32, 512)
vgg_19/relu5_4 (1, 32, 32, 512)
vgg_19/pool5 (1, 16, 16, 512)
(检查featuremap的名字和shape)


1  min=0.0000, max=9201.5811, mean=386.3745, std=737.3252
2  min=0.0000, max=7389.5913, mean=1412.0540, std=616.6437
3  min=0.0000, max=3323.7239, mean=400.2662, std=522.4063
4  min=0.0000, max=4319.3765, mean=369.9904, std=644.4222
5  min=0.0000, max=8997.2305, mean=905.1512, std=1288.8953
(打印 relu3_4,并与之前的两种方法对比数值,结果是一样的,说明整体流程没什么问题)

Finally, take a look at the feature map saved in the code:
insert image description here

Guess you like

Origin blog.csdn.net/bby1987/article/details/119942007