Tensorflow BN specific implementation (in many ways):
Theoretical knowledge (refer Gangster): https://blog.csdn.net/hjimce/article/details/50866313
Additional knowledge:
① tf.nn.moments output of this function is the mean and variance BN needs.
Mode 1:
tf.nn.batch_normalization x, mean, variance, offset, scale, variance_epsilon, name=None
( ): using raw interface package
x
one output · mean moments method of
output · variance moments of one method
· offset BN need to learn parameters
· scale BN need to learn parameters
to prevent the denominator is zero plus a constant of normalization when · variance_epsilon
Implementation code:
. 1 Import tensorflow TF AS 2 . 3 # implemented Normalization Batch 4 DEF bn_layer (X, is_training, name = ' BatchNorm ' , moving_decay = 0.9, = 1E-EPS. 5 ): . 5 # acquiring an input and determines whether the dimension convolution matching layer (4 ), or all the connecting layer (2) . 6 Shape = x.shape . 7 Assert len (Shape) in [2,4 ] . 8 . 9 param_shape Shape = [-1 ] 10 with tf.variable_scope (name): . 11 # statement only BN two parameters need to learn, Y = X * + Beta Gamma 12 is Gamma = tf.get_variable ( 'Gamma ' , param_shape, initializer of tf.constant_initializer = (. 1 )) 13 is Beta = tf.get_variable ( ' Beat ' , param_shape, = initializer of tf.constant_initializer (0)) 14 15 # calculates the current mean and variance of the entire batch 16 axes = List (Range (len (Shape) -1 )) . 17 batch_mean, batch_var tf.nn.moments = (X, axes, name = ' Moments ' ) 18 is . 19 # sliding average mean and variance update 20 is EMA = tf.train. ExponentialMovingAverage (moving_decay) 21 is 22 is DEFmean_var_with_update (): 23 ema_apply_op = ema.apply ([batch_mean, batch_var]) 24- with tf.control_dependencies ([ema_apply_op]): 25 return tf.identity (batch_mean), tf.identity (batch_var) 26 27 # training, update mean and variance, the last saved test before using the mean and variance of 28 mean, var = tf.cond (tf.equal (is_training, True), mean_var_with_update, 29 the lambda : (ema.average (batch_mean), ema.average ( batch_var))) 30 31 is # final performance Normalization BATCH 32 return tf.nn.batch_normalization (X, Mean, var, Beta, Gamma, EPS)
Option 2:
tf.contrib.layers.batch_norm: Batch encapsulated class
Indeed tf.contrib.layers.batch_norm tf.nn.moments for packaging and had a tf.nn.batch_normalization
parameter:
The Inputs 1 : Input
Decay 2 : attenuation coefficient. Suitable attenuation coefficient value is close to 1.0, in particular containing a plurality of values of the 9: 0.999,0.99,0.9. If the training set and good performance verification / test set did not perform well, select
Small coefficient (0.9 recommended). If you want to improve the stability, zero_debias_moving_mean set to True
Center. 3 : If True, there beta offset; offset beta If False, no
Scale 4 : If True, then multiplied by gamma. If False, gamma is not used. The lower layer is linear (e.g. nn.relu), since the scaling can be done by the next layer,
So you can disable this layer.
Epsilon 5 : avoid dividing by zero
Activation_fn. 6 : for activation, the default is a linear activation function
Param_initializers 7 : optimizing initialization beta, gamma, moving mean and moving variance of
Param_regularizers 8 : Beta and Gamma regularization optimization
Updates_collections. 9 : update operations to collect the Collections calculated. updates_ops train_op need to perform. If None, it will add controls to dependency
Ensure that the update has been calculated in place.
Is_training 10 : Layer is in training mode. In training mode, it will accumulate into statistics moving_mean and moving_variance given the exponential moving average of decay。当它不是在训练模式,那么它将使用的数值moving_mean和moving_variance。
11 scope:可选范围variable_scope
注意:训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入
tf.GraphKeys.UPDATE_OPS,所以需要添加它们作为依赖项train_op
。例如:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = optimizer.minimize(loss)
可以将updates_collections = None设置为强制更新,但可能会导致速度损失,尤其是在分布式设置中。
Implementation code:
1 import tensorflow as tf 2 3 def batch_norm(x,epsilon=1e-5, momentum=0.9,train=True, name="batch_norm"): 4 with tf.variable_scope(name): 5 epsilon = epsilon 6 momentum = momentum 7 name = name 8 return tf.contrib.layers.batch_norm(x, decay=momentum, updates_collections=None, epsilon=epsilon, 9 scale=True, is_training=train,scope=name)
BN general release which layer?
BN layer is set to be formed in generally conv-> bn-> scale-> of a block sequence relu
BN difference when training and testing? ? ?
When training bn layer, based on the current batch adjust the distribution of mean and std; when the test is the test, based on the mean and std adjust all the training sample distribution
So, the training time required to make BN layer work, and learn to save BN layer parameters. Test when loading parameters of training to get used to reconstruct the test set.