Article Directory
Detailed explanation of BatchNormalization parameters
tf.keras.layers.BatchNormalization(
axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,
beta_initializer='zeros', gamma_initializer='ones',
moving_mean_initializer='zeros', moving_variance_initializer='ones',
beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99,
fused=None, trainable=True, virtual_batch_size=None, adjustment=None, name=None,
**kwargs
)
description
Normalize the activation of the previous layer in each batch, that is, apply a transformation to keep the average activation near 0 and the activation standard deviation near 1
parameter
axis
integer, the axis that should be normalized (usually the feature axis)
For example, after the Conv2D layer with data_format = "channels_first", set axis = 1 in BatchNormalization
momentum of the
moving average
epsilon
adds small floating point numbers to the variance to avoid division by zero
If center is true, add the offset of beta to the normalized tensor. If False, ignore beta
If scale is true, multiply by gamma. If False, gamma is not used
when the next layer is linear (eg nn.relu), this can be disabled because the scaling will be done by the next layer.
beta_initializer: Initialization of beta weights
gamma_initializer: initialization of gamma weight
moving_mean_initializer: initialization of moving_mean
moving_variance_initializer: initialization of moving_variance
beta_regularizer: optional regularization of beta weights
gamma_regularizer: optional regularization of gamma weights
beta_constraint: optional constraint for beta weight
gamma_constraint: optional constraint of gamma weight
renorm: Momentum uses renorm to update the moving average and standard deviation
renorm_clipping: The keys "rmax", "rmin", "dmax" can be mapped to a dictionary of scalar tensors used for clipping renormalization correction
renorm_momentum
uses renorm to update the moving average and standard deviation. Unlike momentum, it will affect training, it can’t be too small (that will increase noise), and it can’t be too large.
If fused is True, the faster fusion implementation is used, if the fusion implementation cannot be used, ValueError is raised.
If it is None, use a faster implementation if possible.
If False, do not use the fusion implementation.
virtual_batch_size
int type. By default, virtual_batch_size is None, which means batch normalization is performed in the entire batch
adjustment
only during training, using (dynamic) comprising an input shape tensor tensor and returning one pair (percentage variation) function to apply to a normalized value (before gamma and beta) are
If no, no Apply any adjustments.
If virtual_batch_size is specified, it cannot be specified.
Call parameters
inputs
input tensor
training
Boolean value that indicates whether the layer is or should be run in a mode of reasoning in a training mode
when if training = True,
this layer will use the mean and variance of the current batch of inputs on the input normalized
time if training = False,
the The layer will use the mean and variance of the movement statistics obtained during training to normalize the input.
Input shape
When using this layer as the first layer of the model, use the keyword parameter input_shape
Output shape
Same shape as input