Article directory
Environment: tensorflow1.13
Model: use vgg19 as an example
Note: The results of this document are run in CPU mode, because the graphics card is 30 series and the system is win10, the GPU mode cannot be used in tf1.13 version. Therefore, if the results obtained by readers using the GPU mode are slightly different from this document, it should be a normal phenomenon.
The document background is to use the pre-trained vgg model to calculate vgg loss in tensorflow1.x.
Model download
The official pre-training model is in tensorflow's model warehouse, the full path is tensorflow/models/research/slim
, please pay attention to select the tf1.13 branch:
https://github.com/tensorflow/models/tree/r1.13.0/research/slim
The vgg19 model download address:
http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz
After decompression, the file name is vgg_19.ckpt
.
helper function
In the following code, in order to compare whether different model usage methods bring consistent results, a helper function for visual feature map is written to facilitate comparison. The following code will be used directly without repeating it. If readers want to try the code in this document, please paste it by themselves.
def visualize_feature_map(feature_map,
col_nums=None,
gap_value=0.5,
gap_width=10,
gap_height=10):
"""
Visualize feature map in one image.
Parameters
----------
feature_map: numpy array, shape is (height, width, channel)
col_nums: number of feature map columns
gap_value: value for feature map gap
gap_width: width of gap
gap_height: height of gap
Returns
-------
image: image to show feature map
"""
eps = 1e-6
if feature_map.ndim == 4:
if feature_map.shape[0] == 1:
feature_map = np.squeeze(feature_map)
else:
raise ValueError("feature map must be 3 dims ndarray (height, "
"width, channel) or 4 dims ndarray whose shape "
"must be (1, height, width, channel)")
# compute col_nums (if not set) and row_nums
height, width, channel = feature_map.shape
if col_nums is None:
col_nums = int(round(np.sqrt(channel)))
row_nums = int(np.ceil(channel / col_nums))
# compute final image width and height
image_width = col_nums * (width + gap_width) - gap_width
image_height = row_nums * (height + gap_height) - gap_height
image = np.ones(shape=(image_height, image_width),
dtype=feature_map.dtype) * gap_value
cnt = 0
while cnt < channel:
row = cnt // col_nums
col = cnt % col_nums
row_beg = row * (height + gap_height)
row_end = row_beg + height
col_beg = col * (width + gap_width)
col_end = col_beg + width
image[row_beg:row_end, col_beg:col_end] = \
feature_map[:, :, cnt] / (np.std(feature_map[:, :, cnt]) + eps)
cnt += 1
return image
model use
There are three typical ways to use official models:
-
Use the official model file to load the model.
You need to first define the model in the calculation graph using placeholder, and then use the restore method of tf.train.Saver() to load the model parameters. This method requires that the node name and parameter name of our newly defined model must bevgg_19.ckpt
consistent with those saved in , which is why it is recommended to use the official model definition file directly. -
Mokai's official model file.
In the convolution part, the official model file only provides the feature map of the relu layer. Sometimes we may need the feature map of the conv layer. At this time, magic modification is required. The modified node name and parameter name must still bevgg_19.ckpt
consistent with those saved in . -
Use NewCheckpointReader and customize the model
topywrap_tensorflow.NewCheckpointReader(model_path)
read the weight parameters, and then redefine the model structure and assign the weight parameters to the past. In this way, the model structure and node names can be flexibly defined, but the code is cumbersome to write. (The officially defined model file can only get the feature map of the relu layer, but not the conv layer, so the flexibility is not good)
1. Use the official model file to load the model
Official model definition file path (URL):
https://github.com/tensorflow/models/blob/r1.13.0/research/slim/nets/vgg.py
The definition of vgg19 is found as follows:
def vgg_19(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='vgg_19',
fc_conv_padding='VALID',
global_pool=False):
"""
Oxford Net VGG 19-Layers version E Example.
Note: All the fully_connected layers have been transformed to conv2d
layers. To use in classification mode, resize input to 224x224.
Args:
inputs: a tensor of size [batch_size, height, width, channels].
num_classes: number of predicted classes. If 0 or None, the logits
layer is omitted and the input features to the logits layer are
returned instead.
is_training: whether or not the model is being trained.
dropout_keep_prob: the probability that activations are kept in the
dropout layers during training.
spatial_squeeze: whether or not should squeeze the spatial dimensions
of the outputs. Useful to remove unnecessary dimensions for
classification.
scope: Optional scope for the variables.
fc_conv_padding: the type of padding to use for the fully connected
layer that is implemented as a convolutional layer. Use 'SAME'
padding if you are applying the network in a fully convolutional
manner and want to get a prediction map downsampled by a factor of
32 as an output. Otherwise, the output prediction map will be
(input / 32) - 6 in case of 'VALID' padding.
global_pool: Optional boolean flag. If True, the input to the
classification layer is avgpooled to size 1x1, for any input size.
(This is not part of the original VGG architecture.)
Returns:
net: the output of the logits layer (if num_classes is a non-zero
integer), or the non-dropped-out input to the logits layer (if
num_classes is 0 or None).
end_points: a dict of tensors with intermediate activations.
"""
with tf.variable_scope(scope, 'vgg_19', [inputs]) as sc:
end_points_collection = sc.original_name_scope + '_end_points'
# Collect outputs for conv2d, fully_connected and max_pool2d.
with slim.arg_scope(
[slim.conv2d, slim.fully_connected, slim.max_pool2d],
outputs_collections=end_points_collection):
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3],
scope='conv1')
net = slim.max_pool2d(net, [2, 2], scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [2, 2], scope='pool2')
net = slim.repeat(net, 4, slim.conv2d, 256, [3, 3], scope='conv3')
net = slim.max_pool2d(net, [2, 2], scope='pool3')
net = slim.repeat(net, 4, slim.conv2d, 512, [3, 3], scope='conv4')
net = slim.max_pool2d(net, [2, 2], scope='pool4')
net = slim.repeat(net, 4, slim.conv2d, 512, [3, 3], scope='conv5')
net = slim.max_pool2d(net, [2, 2], scope='pool5')
# Use conv2d instead of fully_connected layers.
net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding,
scope='fc6')
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout6')
net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
# Convert end_points_collection into a end_point dict.
end_points = slim.utils.convert_collection_to_dict(
end_points_collection)
if global_pool:
net = tf.reduce_mean(net, [1, 2], keep_dims=True,
name='global_pool')
end_points['global_pool'] = net
if num_classes:
net = slim.dropout(net, dropout_keep_prob,
is_training=is_training,
scope='dropout7')
net = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
scope='fc8')
if spatial_squeeze:
net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
end_points[sc.name + '/fc8'] = net
return net, end_points
Regarding the above model definition, there are two functions that need a little explanation:
-
The definition of slim.conv2d
slim.conv2d
was found to be wrong after being chased in by ctrl and click in Pycharm. The real definition is in the file in the following path:
D:\Program\anaconda3\envs\tf13\Lib\site-packages\tensorflow\ contrib\layers\python\layers\layers.py
(Please note that D:\Program\anaconda3\envs\tf13 is the environment path of tf1.13 on my computer, which needs to be modified according to your own environment)Among them, line 1117
def convolution2d
is the function definition and implementation, and line 3327conv2d = convolution2d
has a short name.convolution2d
There are in the parameter listactivation_fn=nn.relu
, so this convolution has relu as the activation function by default. -
The function of slim.repeat
is to repeat an operator n times. The function implementation is in the same file as slim.conv2d, and the function definitions and partial explanations in the file are as follows:def repeat(inputs, repetitions, layer, *args, **kwargs): """Applies the same layer with the same arguments repeatedly. y = repeat(x, 3, conv2d, 64, [3, 3], scope='conv1') # It is equivalent to: x = conv2d(x, 64, [3, 3], scope='conv1/conv1_1') x = conv2d(x, 64, [3, 3], scope='conv1/conv1_2') y = conv2d(x, 64, [3, 3], scope='conv1/conv1_3') ...... """
Use the script as follows. Note that the two functions vgg_19
and visualize_feature_map
two functions have already appeared in this document and are relatively long, so they are omitted in the following script:
# -*- coding: utf-8 -*-
import os
import cv2
import tensorflow as tf
import numpy as np
os.environ['CUDA_VISIBLE_DEVICES'] = "/gpu:0"
slim = tf.contrib.slim
def vgg_19(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='vgg_19',
fc_conv_padding='VALID',
global_pool=False):
# 见本文档前面
pass
def visualize_feature_map(feature_map,
col_nums=None,
gap_value=0.5,
gap_width=10,
gap_height=10):
# 见本文档前面
pass
def main():
image_file = r'E:\images\lena512color.tiff'
model_path = r'E:\pretrained_model\tf1x\vgg_19.ckpt'
inputs_ = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
outputs, feature_map_dict = vgg_19(inputs_,
num_classes=0,
is_training=False,
global_pool=True)
# print trainable variables
for var in tf.trainable_variables():
print(var)
# load pretrained model
saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, model_path)
# running test
inputs = cv2.imread(image_file)
inputs = np.expand_dims(inputs, axis=0)
out, feature_maps = sess.run([outputs, feature_map_dict],
feed_dict={
inputs_: inputs,
})
# print shape of feature maps
for key in feature_maps.keys():
print(key, feature_maps.get(key).shape)
feature_map = feature_maps.get('vgg_19/conv3/conv3_4')
feature_map = np.squeeze(feature_map)
image = visualize_feature_map(feature_map)
image = np.clip(image * 255, 0, 255).astype(np.uint8)
# cv2.imwrite('lena_feature_map_vgg_conv3_4.png', image)
# print statistics for feature map
for i in range(5):
mean_val = np.mean(feature_map[:, :, i])
std = np.std(feature_map[:, :, i])
min_val = np.min(feature_map[:, :, i])
max_val = np.max(feature_map[:, :, i])
print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
min_val, max_val, mean_val, std))
# print part of final global feature vector
feature_vec = feature_maps.get('global_pool')
feature_vec = np.squeeze(feature_vec)
for i in range(10):
print(feature_vec[i])
if __name__ == '__main__':
main()
The above script needs to be explained as follows:
-
The parameter setting of vgg_19 needs more attention.
If you are just doing inference, is_training must be set to False;
the purpose of my use is to calculate vgg loss, so the full connection part is not needed, so in order not to report an error in the full connection part, set num_classes to 0, and set global_pool to True. -
The weights can only be restored after the calculation graph is created using placeholder. The script prints the weight variables, including name, shape, and dtype,
in the middle part (section) of the calculation graph and restore weights .# print trainable variables
<tf.Variable 'vgg_19/conv1/conv1_1/weights:0' shape=(3, 3, 3, 64) dtype=float32_ref> <tf.Variable 'vgg_19/conv1/conv1_1/biases:0' shape=(64,) dtype=float32_ref> <tf.Variable 'vgg_19/conv1/conv1_2/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref> <tf.Variable 'vgg_19/conv1/conv1_2/biases:0' shape=(64,) dtype=float32_ref> <tf.Variable 'vgg_19/conv2/conv2_1/weights:0' shape=(3, 3, 64, 128) dtype=float32_ref> <tf.Variable 'vgg_19/conv2/conv2_1/biases:0' shape=(128,) dtype=float32_ref> ......
-
vgg_19
There are two outputs.
The first one is easy to understand and is the output of network inference, but it is useless for calculating vgg loss.
The second output saves the feature map of the network in the form of dict. The key of the dict is the node name of the feature map, and the value is the value of the feature map. This is what is really needed to calculate the vgg loss.# print shape of feature maps
The name and shape of the feature map are printed in part, and it is alsoconv3_4
drawn on a picture for some simple and intuitive tests and inspections.vgg_19/conv1/conv1_1 (1, 512, 512, 64) vgg_19/conv1/conv1_2 (1, 512, 512, 64) vgg_19/pool1 (1, 256, 256, 64) vgg_19/conv2/conv2_1 (1, 256, 256, 128) vgg_19/conv2/conv2_2 (1, 256, 256, 128) vgg_19/pool2 (1, 128, 128, 128) ......
-
Print some statistical values of feature map, you can check and confirm the following facts:
feature map has only relu, no conv, because the minimum value of feature map is 0.0;
vgg appears before BN, so there is no BN in the network, resulting in the value of feature map It is very large (if there is BN, the value will generally not exceed 5), so when calculating vgg loss, it is generally necessary to multiply a small weight coefficient according to the specific situation.1 min=0.0000, max=9201.5811, mean=386.3745, std=737.3252 2 min=0.0000, max=7389.5913, mean=1412.0540, std=616.6437 3 min=0.0000, max=3323.7239, mean=400.2662, std=522.4063 4 min=0.0000, max=4319.3765, mean=369.9904, std=644.4222 5 min=0.0000, max=8997.2305, mean=905.1512, std=1288.8953 ......
-
Print a part of the final feature vector to check the correctness after modifying the model definition function:
0.00055606366 0.0 0.0 0.15579844 0.0 1.0548652 0.0 0.0 0.05207316 0.29752082 ......
2. Mokai official model file
The modified model and test code are as follows, which also visualize_feature_map
need to be pasted from above:
# -*- coding: utf-8 -*-
import os
import cv2
import tensorflow as tf
import numpy as np
slim = tf.contrib.slim
def vgg19(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='vgg_19',
fc_conv_padding='VALID',
global_pool=False):
with tf.variable_scope(scope, 'vgg_19', [inputs]) as sc:
end_points_collection = sc.original_name_scope + '_end_points'
# Collect outputs for conv2d, fully_connected and max_pool2d.
with slim.arg_scope(
[slim.conv2d, slim.fully_connected, slim.max_pool2d],
outputs_collections=end_points_collection):
# conv blocks are modified as follows
net_config = [
[64, 2],
[128, 2],
[256, 4],
[512, 4],
[512, 4],
] # [filters, blocks]
net = inputs
relu_dict = {
}
for i, config in enumerate(net_config):
filters = config[0]
for j in range(config[1]):
conv_scope = 'conv%d/conv%d_%d' % (i + 1, i + 1, j + 1)
relu_name = 'conv%d/relu%d_%d' % (i + 1, i + 1, j + 1)
net = slim.conv2d(net, filters, [3, 3],
activation_fn=None,
scope=conv_scope)
net = tf.nn.relu(net, name=relu_name)
relu_dict[net.op.name] = net
net = slim.max_pool2d(net, [2, 2], scope='pool%d' % (i + 1))
# Use conv2d instead of fully_connected layers.
net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding,
scope='fc6')
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout6')
net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
# Convert end_points_collection into a end_point dict.
end_points = slim.utils.convert_collection_to_dict(
end_points_collection)
if global_pool:
net = tf.reduce_mean(net, [1, 2], keep_dims=True,
name='global_pool')
end_points['global_pool'] = net
if num_classes:
net = slim.dropout(net, dropout_keep_prob,
is_training=is_training,
scope='dropout7')
net = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
scope='fc8')
if spatial_squeeze:
net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
end_points[sc.name + '/fc8'] = net
end_points.update(relu_dict)
return net, end_points
def visualize_feature_map(feature_map,
col_nums=None,
gap_value=0.5,
gap_width=10,
gap_height=10):
# 见本文档前面
pass
def main():
image_file = r'D:\data\test_images\lena512color.tiff'
model_path = r'E:\pretrained_model\tensorflow1.13\vgg_19.ckpt'
inputs_ = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
outputs, feature_map_dict = vgg19(inputs_,
num_classes=0,
is_training=False,
global_pool=True)
# check trainable variables
for var in tf.trainable_variables():
print(var)
# load pretrained model
saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, model_path)
# running test
inputs = cv2.imread(image_file)
inputs = np.expand_dims(inputs, axis=0)
out, feature_maps = sess.run([outputs, feature_map_dict],
feed_dict={
inputs_: inputs,
})
# print shape of feature maps
for key in feature_maps.keys():
print(key, feature_maps.get(key).shape)
feature_map = feature_maps.get('vgg_19/conv3/relu3_4')
feature_map = np.squeeze(feature_map)
image = visualize_feature_map(feature_map)
image = np.clip(image * 255, 0, 255).astype(np.uint8)
cv2.imwrite('lena_feature_map_vgg_relu3_4--2.png', image)
# print statistics for relu3_4
for i in range(5):
mean_val = np.mean(feature_map[:, :, i])
std = np.std(feature_map[:, :, i])
min_val = np.min(feature_map[:, :, i])
max_val = np.max(feature_map[:, :, i])
print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
min_val, max_val, mean_val, std))
# print statistics for conv3_4
print('\n')
feature_map = feature_maps.get('vgg_19/conv3/conv3_4')
feature_map = np.squeeze(feature_map)
for i in range(5):
mean_val = np.mean(feature_map[:, :, i])
std = np.std(feature_map[:, :, i])
min_val = np.min(feature_map[:, :, i])
max_val = np.max(feature_map[:, :, i])
print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
min_val, max_val, mean_val, std))
feature_vec = feature_maps.get('global_pool')
feature_vec = np.squeeze(feature_vec)
for i in range(10):
print(feature_vec[i])
if __name__ == '__main__':
main()
described as follows:
-
The purpose of changing the model structure: to separate conv and relu, so that the result of the conv layer can be used as the input of vgg loss.
-
The key points of changing the model definition function: ensure that the structure cannot be changed; ensure that the node name cannot be changed; since the original code cannot directly collect the feature map of the relu layer, it is necessary to define a dict for collection.
-
The feature map statistics of conv3_4 are as follows. It can be seen that the min value has already appeared negative, so the separation from the relu layer has indeed been achieved.
1 min=-2909.7209, max=9201.5811, mean=92.2542, std=991.5225 2 min=-431.0446, max=7389.5913, mean=1411.6982, std=617.5237 3 min=-1092.2075, max=3323.7239, mean=339.5828, std=582.6731 4 min=-2396.4478, max=4319.3765, mean=106.6278, std=852.0536 5 min=-3547.4551, max=8997.2305, mean=699.2141, std=1488.1344
-
The content in feature_maps has more relu parts, because it is updated at the end of the code, so this part is at the end of feature_maps:
...... vgg_19/conv1/relu1_1 (1, 512, 512, 64) vgg_19/conv1/relu1_2 (1, 512, 512, 64) vgg_19/conv2/relu2_1 (1, 256, 256, 128) vgg_19/conv2/relu2_2 (1, 256, 256, 128) vgg_19/conv3/relu3_1 (1, 128, 128, 256) vgg_19/conv3/relu3_2 (1, 128, 128, 256) vgg_19/conv3/relu3_3 (1, 128, 128, 256) vgg_19/conv3/relu3_4 (1, 128, 128, 256) ......
-
Other output variables have been checked, and there is no difference with the first method, indicating that the result of the magic modification is correct.
3. Use NewCheckpointReader and customize the model
This method is described in two parts.
The first part simply explains how to get the weight parameters from the pre-trained model; the second part details how to assign the pre-trained weight coefficients to the newly defined model and test it.
Here is the code for the first part:
# -*- coding: utf-8 -*-
from tensorflow.python import pywrap_tensorflow as wrap
def main():
model_path = r'E:\pretrained_model\tf1x\vgg_19.ckpt'
reader = wrap.NewCheckpointReader(model_path)
variables_shape = reader.get_variable_to_shape_map()
variables_dtype = reader.get_variable_to_dtype_map()
for key in variables_shape.keys():
print(key, variables_shape.get(key), variables_dtype.get(key))
print('\n')
print(reader.has_tensor("vgg_19/mean_rgb"))
rgb_mean = reader.get_tensor("vgg_19/mean_rgb")
print(rgb_mean)
if __name__ == '__main__':
main()
There are a few caveats to the above code:
- NewCheckpointReader is used to load the weight of the pre-trained model
- get_variable_to_shape_map() and get_variable_to_dtype_map() can view the shape and dtype of weight parameters
- get_tensor() can get the weight parameters and return
numpy
an array
The printed results are as follows, in which there are two clever parameters, global_step
and vgg_19/mean_rgb
, mean_rgb is printed with specific values:
global_step [] <dtype: 'int64'>
vgg_19/conv2/conv2_2/biases [128] <dtype: 'float32'>
vgg_19/conv2/conv2_2/weights [3, 3, 128, 128] <dtype: 'float32'>
vgg_19/conv1/conv1_1/biases [64] <dtype: 'float32'>
vgg_19/conv1/conv1_1/weights [3, 3, 3, 64] <dtype: 'float32'>
vgg_19/conv1/conv1_2/biases [64] <dtype: 'float32'>
vgg_19/conv1/conv1_2/weights [3, 3, 64, 64] <dtype: 'float32'>
......
vgg_19/mean_rgb [3] <dtype: 'float32'>
......
vgg_19/fc8/weights [1, 1, 4096, 1000] <dtype: 'float32'>
[123.68 116.78 103.94]
Here is the code for the second part:
# -*- coding: utf-8 -*-
import os
import cv2
import tensorflow as tf
import numpy as np
from tensorflow.python import pywrap_tensorflow as wrap
os.environ['CUDA_VISIBLE_DEVICES'] = "/gpu:0"
slim = tf.contrib.slim
def vgg19(inputs,
scope_name='vgg_19'):
with tf.variable_scope(scope_name):
net_config = [
[64, 2],
[128, 2],
[256, 4],
[512, 4],
[512, 4],
] # [filters, blocks]
feature_maps = {
}
x = inputs
for i, config in enumerate(net_config):
filters = config[0]
for j in range(config[1]):
conv_name = 'conv%d_%d' % (i + 1, j + 1)
relu_name = 'relu%d_%d' % (i + 1, j + 1)
x = tf.layers.conv2d(x, filters, [3, 3],
padding='same',
name=conv_name)
feat_map_name = x.op.name.replace('/BiasAdd', '')
feature_maps[feat_map_name] = x
x = tf.nn.relu(x, name=relu_name)
feature_maps[x.op.name] = x
x = tf.layers.max_pooling2d(x, (2, 2), (2, 2),
name='pool%d' % (i + 1))
feat_map_name = x.op.name.replace('/MaxPool', '')
feature_maps[feat_map_name] = x
return x, feature_maps
def visualize_feature_map(feature_map,
col_nums=None,
gap_value=0.5,
gap_width=10,
gap_height=10):
# 见本文档前面
pass
def _get_pretrained_tensor_name(name):
block_num = int(name.split('/')[1][4:].split('_')[0])
name = name.replace('vgg_19', 'vgg_19/conv%d' % block_num)
name = name.replace('kernel', 'weights').replace('bias', 'biases')
return name
def main():
image_file = r'E:\images\lena512color.tiff'
model_path = r'E:\pretrained_model\tf1x\vgg_19.ckpt'
inputs_ = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
outputs, feature_map_dict = vgg19(inputs_)
trainable_vars = tf.trainable_variables()
# use NewCheckpointReader to get weights
reader = wrap.NewCheckpointReader(model_path)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# print trainable variables before assignment
for var in trainable_vars:
print(var)
print(sess.run(var)[:, :, 0, 0])
break
# trainable variables assignment
print('\n')
for i, var in enumerate(trainable_vars):
name = _get_pretrained_tensor_name(var.op.name)
sess.run(var.assign(reader.get_tensor(name)))
# print trainable variables after assignment
for var in trainable_vars:
print(var)
name = _get_pretrained_tensor_name(var.op.name)
print(sess.run(var)[:, :, 0, 0])
print('pretrained weight:')
print(reader.get_tensor(name)[:, :, 0, 0])
break
# test case
inputs = cv2.imread(image_file)
inputs = np.expand_dims(inputs, axis=0)
out, feature_maps = sess.run([outputs, feature_map_dict],
feed_dict={
inputs_: inputs,
})
# print shape of feature maps
print('\n')
for key in feature_maps.keys():
print(key, feature_maps.get(key).shape)
feature_map = feature_maps.get('vgg_19/relu3_4')
feature_map = np.squeeze(feature_map)
image = visualize_feature_map(feature_map)
image = np.clip(image * 255, 0, 255).astype(np.uint8)
cv2.imwrite('lena_feature_map_vgg_conv3_4--2.png', image)
# print statistics for feature map
print('\n')
for i in range(5):
mean_val = np.mean(feature_map[:, :, i])
std = np.std(feature_map[:, :, i])
min_val = np.min(feature_map[:, :, i])
max_val = np.max(feature_map[:, :, i])
print(i + 1, " min=%.4f, max=%.4f, mean=%.4f, std=%.4f" % (
min_val, max_val, mean_val, std))
if __name__ == '__main__':
main()
For example, for the requirement of vgg loss, we usually don’t need the last fully connected layer, so for the purpose of saving computing power and video memory, the above newly defined model removes the fully connected part, and the name of feature map / variables is also re-named defined, in this case, the restore function cannot be used to load the pre-training parameters, only the way of assignment can be used.
The above code is divided into two parts as a whole, one is the weight parameter assignment, and the other is the same test case as before.
The process of assignment is described below:
- Use placeholder to create calculation graph and get
trainable_vars
- Use
NewCheckpointReader
the weight parameters of the loaded pre-trained model - Create a Session and initialize global variables
- Use
var.assign()
the method to assign values to the weight parameters
The result printed by the above code is as follows:
<tf.Variable 'vgg_19/conv1_1/kernel:0' shape=(3, 3, 3, 64) dtype=float32_ref>
[[ 0.04975817 -0.0374901 -0.04425776]
[ 0.03555809 0.08642714 0.05649987]
[-0.07783681 -0.03184588 -0.07609541]]
(sess.run(tf.global_variables_initializer())之后打印了kernel的一部分,为随机初始化的结果)
<tf.Variable 'vgg_19/conv1_1/kernel:0' shape=(3, 3, 3, 64) dtype=float32_ref>
[[ 0.39416704 0.37740308 -0.04594866]
[ 0.2671299 0.09986369 -0.34100872]
[-0.07573577 -0.2803425 -0.41602272]]
pretrained weight:
[[ 0.39416704 0.37740308 -0.04594866]
[ 0.2671299 0.09986369 -0.34100872]
[-0.07573577 -0.2803425 -0.41602272]]
(权重参数赋值之后,有一次打印了kernel的一部分,同时也打印了预训练模型中对应的部分,可以看到kernel被成功赋值)
vgg_19/conv1_1 (1, 512, 512, 64)
vgg_19/relu1_1 (1, 512, 512, 64)
vgg_19/conv1_2 (1, 512, 512, 64)
vgg_19/relu1_2 (1, 512, 512, 64)
vgg_19/pool1 (1, 256, 256, 64)
......
vgg_19/conv5_4 (1, 32, 32, 512)
vgg_19/relu5_4 (1, 32, 32, 512)
vgg_19/pool5 (1, 16, 16, 512)
(检查featuremap的名字和shape)
1 min=0.0000, max=9201.5811, mean=386.3745, std=737.3252
2 min=0.0000, max=7389.5913, mean=1412.0540, std=616.6437
3 min=0.0000, max=3323.7239, mean=400.2662, std=522.4063
4 min=0.0000, max=4319.3765, mean=369.9904, std=644.4222
5 min=0.0000, max=8997.2305, mean=905.1512, std=1288.8953
(打印 relu3_4,并与之前的两种方法对比数值,结果是一样的,说明整体流程没什么问题)
Finally, take a look at the feature map saved in the code: