table of Contents
1.2 Common forms of data standardization
2.3 Why do batch standardization
4 TF realizes batch standardization
4.2 Batch standardized forecasting process
4.3 Location of batch processing:
1 Standardization
Standardization in traditional machine learning is also called normalization , which is generally to map data to a specified range to remove problems in the processing of data of different dimensions
1.1 Advantages
Data standardization makes the different samples seen by the machine learning model more similar to each other , which helps model learning and generalization of new data
1.2 Common forms of data standardization
Standardization and normalization : subtract the average of the data to make its center 0, and then divide the data by its standard deviation to make its standard deviation 1 .
For specific formulas, see the formulas in the book on the basis of deep learning
2 What is Batch Normalization (Batch Normalization)
Batch Normalization, similar to ordinary data standardization, is a method of unifying scattered data and a method of optimizing neural networks.
2.2 Batch standardization
Not only do you standardize the data before entering it into the model . Data standards should be considered after every transformation of the network
2.3 Why do batch standardization
The problem that batch standardization solves is gradient disappearance and gradient explosion .
Benefits of using batch processing:
3 Summary:
Batch standardization benefits
- Has the effect of regularization
- Improve the generalization ability of the model
- Allows a higher learning rate to accelerate convergence
4 TF realizes batch standardization
The BatchNormalization layer is usually used after the convolutional layer or the densely connected layer .
4.1 Implementation process:
4.2 Batch standardized forecasting process
When we model training we record the mean and variance in each batch, until the training is completed, we find the mean and variance of expectations throughout the training sample values, our predictions as the mean and variance BN
4.3 Location of batch processing:
The original paper said that it should generally be used before the nonlinear activation function in CNN, but in fact, the effect may be better after the activation function.
4.4 Implementation code: