Deep Learning: Understanding of BatchNorm, LayerNorm, InstanceNorm, GroupNorm and SwitchableNorm

Deep Learning: Understanding of BatchNorm, LayerNorm, InstanceNorm, GroupNorm and SwitchableNorm

Norm in Deep Learning

BatchNorm, LayerNorm, InstanceNorm, and GroupNorm are often encountered in deep learning. The differences between the four are as follows:
insert image description here
In addition, there is also the method of SwitchableNorm, which will be introduced one by one below.

BatchNorm

The processing object of BatchNorm is the same channel feature of a batch of samples . BatchNorm is to normalize the mean and variance of the same dimension features of this batch of samples. BatchNorm is widely used in the CV field.

The benefits of BatchNorm are as follows:
1. Improve the flow of gradients in the network. Normalization can scale all the features to [0,1], so that the gradient during backpropagation is around 1, avoiding the phenomenon of gradient disappearance.
2. Increase the learning rate. The normalized data can reach convergence quickly.
3. Reduce the dependence of model training on initialization.

LayerNorm

The processing object of LayerNorm is all channel features of each single sample , and LayerNorm is to normalize the mean and variance of all dimension features of this single sample. LayerNorm is widely used in the field of NLP.

Since the feature dimensions of different dimensions are often different, why do we use LayerNorm? Because in the NLP field, LayerNorm is more suitable.
If we form a batch of texts into a batch, then BatchNorm operates on the same dimension feature (same position) of each sentence, and we understand that the text is read sentence by sentence, which does not conform to the law of NLP.
LayerNorm is normalized for a sentence, and LayerNorm is generally used in the third dimension, such as dims in [batchsize, seq_len, dims], generally the dimension of word vectors, etc., the amount of each feature of this dimension outline should be the same. Therefore, the normalized scaling problem caused by the different dimensions of the features above will not be encountered.

InstanceNorm

The processing object of InstanceNorm is the same channel feature of each single sample . InstanceNorm is to normalize the mean and variance of the same dimension feature of a single sample. InstanceNorm is widely used in stylized migration.

Because in image stylization, the generated results mainly depend on a certain image instance, so normalizing the entire batch is not suitable for image stylization, so normalize HW. It can speed up the model convergence and maintain the independence between each image instance.

GroupNorm

The processing object of GroupNorm is the same set of channel features of each single sample , and GroupNorm is to normalize the mean and variance of the same set of dimensional features of a single sample.

SwitchableNorm

SwitchableNorm is a combination of BatchNorm, LayerNorm, and InstanceNorm, and assigns different weights to allow the network to learn the normalization layer adaptively.

appendix

Pytorch official website - Normalization Layers

Guess you like

Origin blog.csdn.net/weixin_43603658/article/details/131957131