Normalization

1. Batch Normalization

Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning. PMLR, 2015: 448-456.

insert image description here
The specific method is shown in the figure above. Calculate the mean and variance of the channel dimension to standardize, that is, standardize the features of the corresponding position of each sample in the batch (take the mean as an example, calculate the mean of the feature map of the nth channel of each sample, Get a mean feature map, and then calculate the mean for the height and width, and finally get n 1x1 mean feature maps). where $\gamma$ 和 $\beta$ is a learnable hyperparameter. The proposal of BN is to speed up the training of neural network and solveInternal Covariate Shift. After usingBN,we can use a larger learning rate.

In addition, BN can also provide regularization, thereby reducing the use of Dropout .

Pytorch concise code implementation:

import torch
import torch.nn as nn

BN = nn.BatchNorm2d(num_features=2)
a = torch.tensor([[[1, 2, 3],[4, 5, 6], [7, 8, 9]],
                  [[0, 1, 2],[3, 4, 5], [6, 7, 20]]], dtype=torch.float32)
a = a.unsqueeze(0)
a = a.repeat(3, 1, 1, 1)  # [3,2,3,3]
print(BN(a))

insert image description here
Custom BN:
the result is the same

class MyBN(nn.Module):

    def __init__(self, num_features):
        super(MyBN, self).__init__()
        self.scale = nn.Parameter(torch.ones(size=(num_features,)))
        self.shift = nn.Parameter(torch.zeros(size=(num_features,)))
        self.eps = 1e-5

    def forward(self, x):
        # x[N,C,H,W]
        mean = torch.mean(x,dim=[0,2,3], keepdim=True)
        var = torch.mean(x**2, dim=[0,2,3], keepdim=True) - mean**2
        x = (x - mean) / (torch.sqrt(var + self.eps))
        x = x * self.scale.reshape(-1,1,1) + self.shift.reshape(-1,1,1)
        return x
MBN = MyBN(2)
print(MBN(a))

insert image description here

2. Layer Normalization

Ba JL, Kiros JR, Hinton GE. Layer normalization[J]. arXiv preprint arXiv:1607.06450, 2016.

When the batch size is too small, the effect of BN is often not ideal, and it is difficult to apply to NLP tasks, because NLP sentences usually have blank tokens filled at the end, so each sample in the batch corresponds to It is completely meaningless to do BN with features. In order to solve this problem, LN was proposed. The principle is very simple, that is, each sample can be standardized by itself.
insert image description here

Pytorch concise code implementation:

LN = nn.LayerNorm(normalized_shape=10)
a = torch.tensor([[[1,2,3,4,5,6,7,8,9,10],
                  [0,0,1,1,2,2,3,3,4,4]]],dtype=torch.float32)
print(LN(a))

insert image description here
Custom LN:

class MyLN(nn.Module):

    def __init__(self, normalized_shape):
        super(MyLN, self).__init__()
        self.normalized_shape = normalized_shape
        self.scale = nn.Parameter(torch.ones(normalized_shape))
        self.shift = nn.Parameter(torch.zeros(normalized_shape))
        self.eps = 1e-5

    def forward(self, x):
        if isinstance(self.normalized_shape, list):
            dim = [-(i+1) for i in range(len(self.normalized_shape))]
        else:
            dim = -1
        mean = torch.mean(x, dim=dim, keepdim=True)
        var = torch.mean(x**2, dim=dim, keepdim=True) - mean**2
        x = (x - mean) / (torch.sqrt(var + self.eps))
        x = x * self.scale + self.shift
        return x

1. Batch Normalization

2. Layer Normalization

Guess you like