TensorFlow Notes (xi) - (in the definition of the network) using the batch normalize function tensorflow

 

table of Contents

First, the auxiliary function

1.1 slim.arg_scope()

1.2 slim.utils.collect_named_outputs()

1.3 slim.utils.convert_collection_to_dict()

Second, the function layer

2.1 batch_norm processing

2.3 tf.contrib.slim.conv2d()

2.4 slim.max_pool2d

2.5 slim.fully_connected


First, the auxiliary function

1.1 slim.arg_scope()

slim.arg_scope can define default parameter values for some functions, you can not put all the parameters are written once when in scope, we repeat use these functions, functional division showing the structure of attention it has no tf.variable_scope (),

with slim.arg_scope([slim.conv2d, slim.fully_connected],
                    trainable=True,
                    activation_fn=tf.nn.relu,
                    weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                    weights_regularizer=slim.l2_regularizer(0.0001)):
    with slim.arg_scope([slim.conv2d],
                        kernel_size=[3, 3],
                        padding='SAME',
                        normalizer_fn=slim.batch_norm):
        net = slim.conv2d(net, 64, scope='conv1'))
        net = slim.conv2d(net, 128, scope='conv2'))
        net = slim.conv2d(net, 256, [5, 5], scope='conv3'))

      slim.arg_scope usage basically reflected in the above. Slim.arg_scope a list can be simultaneously a plurality of functions defined default parameters (if these parameters are functions of these), in addition, also allow slim.arg_scope nested. Wherein the function call, can not repeat the write parameters (e.g. kernel_size = [3, 3]) , but also allows cover (e.g. the last line, the size of the convolution kernel [5,5]). 
Further, also possible to so many encapsulated scope function:

def new_arg_sc():
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                        trainable=True,
                        activation_fn=tf.nn.relu,
                        weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                        weights_regularizer=slim.l2_regularizer(0.0001)):
        with slim.arg_scope([slim.conv2d],
                            kernel_size=[3, 3],
                            padding='SAME',
                            normalizer_fn=slim.batch_norm) as sc:
            return sc
 
def main():
    ......
    with slim.arg_scope(new_arg_sc()):
        ......

1.2 slim.utils.collect_named_outputs()

Take individual variable name, and collected in collection

net = slim.utils.collect_named_outputs(outputs_collections,sc.name,net)

Meaning parameters as follows,

return:这个方法会返回本次添加的tensor对象,
参数二:意义是为tensor添加一个别名,并收集进collections中
查看源码可见实现如下

if collections:
    append_tensor_alias(outputs,alias)
    ops.add_to_collections(collections,outputs)
return outputs

据说本方法位置已经被转移到这里了,
from tensorflow.contrib.layers.python.layers import utils
utils.collect_named_outputs()

1.3 slim.utils.convert_collection_to_dict()

#集合转换为字典,{节点名:输出张量值}
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
  
# 收集 & 释放 集合值
tf.add_to_collection("loss",mse_loss)
tf.add_n(tf.get_collection("loss"))

Second, the function layer

2.1 batch_norm processing

slim.batch_norm () normalizer_fn function, and the function of each layer of the slim = calls will be used slim.batch_norm, many of its parameters, need to pass in the form of a dictionary,

batch_norm_params = {  # 定义batch normalization(标准化)的参数字典
        'is_training': is_training,
        # 是否是在训练模式,如果是在训练阶段,将会使用指数衰减函数(衰减系数为指定的decay),
        # 对moving_mean和moving_variance进行统计特性的动量更新,也就是进行使用指数衰减函数对均值和方
        # 差进行更新,而如果是在测试阶段,均值和方差就是固定不变的,是在训练阶段就求好的,在训练阶段,
        # 每个批的均值和方差的更新是加上了一个指数衰减函数,而最后求得的整个训练样本的均值和方差就是所
        # 有批的均值的均值,和所有批的方差的无偏估计
        'zero_debias_moving_mean': True,
        # 如果为True,将会创建一个新的变量对 'moving_mean/biased' and 'moving_mean/local_step',
        # 默认设置为False,将其设为True可以增加稳定性
        'decay': batch_norm_decay, # Decay for the moving averages.
        # 该参数能够衡量使用指数衰减函数更新均值方差时,更新的速度,取值通常在0.999-0.99-0.9之间,值
        # 越小,代表更新速度越快,而值太大的话,有可能会导致均值方差更新太慢,而最后变成一个常量1,而
        # 这个值会导致模型性能较低很多.另外,如果出现过拟合时,也可以考虑增加均值和方差的更新速度,也
        # 就是减小decay
        'epsilon': batch_norm_epsilon,# 就是在归一化时,除以方差时,防止方差为0而加上的一个数
        'scale': batch_norm_scale,
        'updates_collections': tf.GraphKeys.UPDATE_OPS,    
        # force in-place updates of mean and variance estimates
        # 该参数有一个默认值,ops.GraphKeys.UPDATE_OPS,当取默认值时,slim会在当前批训练完成后再更新均
        # 值和方差,这样会存在一个问题,就是当前批数据使用的均值和方差总是慢一拍,最后导致训练出来的模
        # 型性能较差。所以,一般需要将该值设为None,这样slim进行批处理时,会对均值和方差进行即时更新,
        # 批处理使用的就是最新的均值和方差。
        #
        # 另外,不论是即使更新还是一步训练后再对所有均值方差一起更新,对测试数据是没有影响的,即测试数
        # 据使用的都是保存的模型中的均值方差数据,但是如果你在训练中需要测试,而忘了将is_training这个值
        # 改成false,那么这批测试数据将会综合当前批数据的均值方差和训练数据的均值方差。而这样做应该是不
        # 正确的。

    }

def batch_norm(inputs,
               decay=0.999, # Decay for the moving averages.该参数能够衡#量使用指数衰减函数更新均值方差时,更新的速度,取值通常在0.999-0.99-0.9之间,值越小,代表更新速#度越快,而值太大的话,有可能会导致均值方差更新太慢,而最后变成一个常量1,而这个值会导致模型性能较#低很多.另外,如果出现过拟合时,也可以考虑增加均值和方差的更新速度,也就是减小decay
               center=True,
               scale=False,
               epsilon=0.001,
               activation_fn=None,
               param_initializers=None,
               param_regularizers=None,
               updates_collections=ops.GraphKeys.UPDATE_OPS,
               is_training=True,
               reuse=None,
               variables_collections=None,
               outputs_collections=None,
               trainable=True,
               batch_weights=None,
               fused=False,
               data_format=DATA_FORMAT_NHWC,
               zero_debias_moving_mean=False,
               scope=None,
               renorm=False,
               renorm_clipping=None,
               renorm_decay=0.99):

When calling the other layers in the form of the following parameters,

normalizer_fn=slim.batch_norm,  # 标准化器设置为BN
normalizer_params=batch_norm_params

Note the use of batch_norm but a layer node definition in training need to add some statements, slim.batch_norm there moving_mean and moving_variance two quantities represent the mean and variance of each batch. Fortunately understood in training, but at the time of testing, meaning moving_mean and moving_variance changed, in training,

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
    with tf.control_dependencies(update_ops): 
        train_step = tf.train.GradientDescentOptimizer(0.01).minimize(total_loss) 
# 注意并tf本体的batch_normal操作也需要这步操作
# 其中,tf.control_dependencies(update_ops)表示with段中的操作是在update_ops操作执行之后 再执行的

#-----------------还有一种写法------------------
# 定义占位符,X表示网络的输入,Y表示真实值label
X = tf.placeholder("float", [None, 224, 224, 3])
Y = tf.placeholder("float", [None, 100])

#调用含batch_norm的resnet网络,其中记得is_training=True
logits = model.resnet(X, 100, is_training=True)
cross_entropy = -tf.reduce_sum(Y*tf.log(logits))

#训练的op一定要用slim的slim.learning.create_train_op,只用tf.train.MomentumOptimizer.minimize()是不行的
opt = tf.train.MomentumOptimizer(lr_rate, 0.9)
train_op = slim.learning.create_train_op(cross_entropy, opt, global_step=global_step)

#更新操作,具体含义不是很明白,直接套用即可
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
if update_ops:
    updates = tf.group(*update_ops)
    cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)

2.3 tf.contrib.slim.conv2d()

convolution(inputs,# 是指需要做卷积的输入图像
          num_outputs,#指定卷积核的个数(就是filter的个数)
          kernel_size,# 用于指定卷积核的维度(卷积核的宽度,卷积核的高度)
          stride=1,
          padding='SAME',#为padding的方式选择,VALID或者SAME
          data_format=None,#是用于指定输入的input的格式
          rate=1,#这个参数不是太理解,而且tf.nn.conv2d中也没有,对于使用atrous convolution的膨胀率(不是太懂这个atrous convolution)
          activation_fn=nn.relu,#用于激活函数的指定,默认的为ReLU函数
          normalizer_fn=None,#用于指定正则化函数
          normalizer_params=None,#用于指定正则化函数的参数
          weights_initializer=initializers.xavier_initializer(),#用于指定权重的初始化程序
          weights_regularizer=None,#为权重可选的正则化程序
          biases_initializer=init_ops.zeros_initializer(),#用于指定biase的初始化程序
          biases_regularizer=None,#biases可选的正则化程序
          reuse=None,#指定是否共享层或者和变量
          variables_collections=None,#指定所有变量的集合列表或者字典
          outputs_collections=None,#指定输出被添加的集合
          trainable=True,#卷积层的参数是否可被训练
          scope=None)#共享变量所指的variable_scope

slim.conv2d is further encapsulated tf.conv2d based, eliminating the need for a lot of parameters, the general call as follows:

1

net = slim.conv2d(inputs, 256, [33], stride=1, scope='conv1_1')

2.4 slim.max_pool2d

This function is more simple, is used as follows:

1

net = slim.max_pool2d(net, [22], scope='pool1')<a name="t4" target="_blank"></a>

2.5 slim.fully_connected

1

slim.fully_connected(x, 128, scope='fc1')

The first two parameters are the number of neural network input, output element.

reference:

[1] "TensorFlow" slim module commonly used API

[2] tensorflow in batch normalize usage: https://blog.csdn.net/jiruiYang/article/details/77202674

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_37764129/article/details/94136469