Deep learning-BN (Batch Normalization)

1. Introduction
Batch Normalization is proposed in a paper in 2015. Data normalization\color{blue}{data normalization}Data normalization methods are often used before activation layers in deep neural networks. Its role can speed up the convergence speed of model training, make the model training process more stable, and avoid gradient explosion or gradient disappearance. And it plays a certain regularization role, almost replacing Dropout.
2. Formula
I nput : B = { x 1... m } ; λ , β ( parameterstobelearned ) Input:B = \lbrace x_{1...m}\rbrace;\lambda,\beta(parameters\quad to \quad be\quad learned)Input:B={ x1...m};l ,β(parameterstobelearned) O u t p u t : { y i = B N λ , β ( x i ) } Output:\lbrace{y_i=BN_{\lambda,\beta}(x_i)}\rbrace Output:{ yi=BNl , b(xi)} μ B ← 1 m ∑ i = 1 m x i \mu_B\leftarrow\cfrac{1}{m}\sum_{i=1}^mx_i mBm1i=1mxi σ B 2 ← 1 m ∑ i = 1 m ( x i − μ B ) 2 \sigma_B^2\leftarrow\cfrac{1}{m}\sum_{i=1}^m(x_i-\mu_B)^2 pB2m1i=1m(ximB)2 x i ‾ ← x i − μ B σ B 2 + ϵ \overline{x_i}\leftarrow\cfrac{x_i-\mu_B}{\sqrt{\sigma_B^2+\epsilon}} xipB2+ϵ ximB y i ← γ x i ‾ + β y_i\leftarrow\gamma \overline{x_i}+\beta yicxi+b

The specific operation of BN is: first calculate BBThe mean and variance of B , after whichBBThe mean and variance of the B set are transformed into 0 and 1, and finallythe BBEach element in B is multiplied by γ \gammaγ plusβ \betaβ , the output. γ \gammacb \betaβ is a trainable parameter that participates in the BP of the entire network;
the purpose of normalization is to regularize the data into a uniform interval, reduce the divergence of data, and reduce the learning difficulty of the network. The essence of BN is that after normalization, useγ \gammacb \betaAs a reduction parameter, β retains the distribution of the original data to a certain extent.
3. BBComposition of B

The dimensions of the tensor data passed in the neural network are usually recorded as [N, H, W, C], where N is batch_size, H and W are rows and columns, and C is the number of channels. Then the input set BB of BN in the above formulaB is the blue part in the figure below.
insert image description here
The calculation of the mean is to add up the numbers in each channel separately in a batch, and then divide byN × H × WN \times H \times WN×H×w . For example: there are 10 pictures in this batch, each picture has three RBG channels, and the height and width of each picture are H and W, then the mean value is calculated by dividing the sum of the pixel values ​​​​of the R channels of the 10 pictures by 10× H × W 10 \times H \times W10×H×W , and then calculate the sum of all pixel values ​​of channel B divided by10 × H × W 10 \times H \times W10×H×W , finally calculate the sum of the pixel values ​​​​of the G channel divided by10 × H × W 10 \times H \times W10×H×w . The calculation of variance is similar.
Trainable parameterγ \gammacb \betaThe dimension of β is equal to the number of channels of the tensor. In the above example, each of the three channels of RBG needs aγ \gammaγ and aβ \betaβ , soγ \gammacb \betaThe dimension of β is equal to 3.
4. The mean and variance in BN during training and inference.
During training, the mean and variance are the mean and variance of the corresponding dimensions of the data in the batch, respectively; \color{blue}{During training, the mean and variance are respectively the The mean and variance of the corresponding dimension of the data;}During training, the mean and variance are respectively the mean and variance of the corresponding dimensions of the data in the batch;
during inference, the mean and variance are calculated based on the expectations of all batches and the method. The formula is as follows: \color{blue}{Inference When , the mean and variance are calculated based on the expectation of all batches and the expectation of the method, the formula is as follows:}During inference, the mean and variance are calculated based on the expectations of all batches and the method. The formula is as follows:
E [ x ] ← EB [ μ B ] E[x]\leftarrow E_B[\mu_B]E [ x ]EB[ mB] V a r [ x ] ← m m − 1 E B [ σ B 2 ] Var[x]\leftarrow \cfrac{m}{m-1}E_B[\sigma_B^2] r [ x ] _m1mEB[ pB2]
Reference:
https://zhuanlan.zhihu.com/p/93643523

Guess you like

Origin blog.csdn.net/weixin_40826634/article/details/128194085