Tensorflow2.0 introduction and practical study notes (6)-batch standardization & convolutional neural network (satellite image)

table of Contents

1 Standardization

1.1 Advantages

1.2 Common forms of data standardization

2 What is Batch Normalization

2.2 Batch standardization

2.3 Why do batch standardization

3 Summary:

4 TF realizes batch standardization

4.1 Implementation process:

4.2 Batch standardized forecasting process

4.3 Location of batch processing:

4.4 Implementation code:


1 Standardization

Standardization in traditional machine learning is also called normalization , which is generally to map data to a specified range to remove problems in the processing of data of different dimensions

 

1.1 Advantages

Data standardization makes the different samples seen by the machine learning model more similar to each other , which helps model learning and generalization of new data

1.2 Common forms of data standardization

 

Standardization and normalization : subtract the average of the data to make its center 0, and then divide the data by its standard deviation to make its standard deviation 1 .

For specific formulas, see the formulas in the book on the basis of deep learning

What is Batch Normalization (Batch Normalization)

Batch Normalization, similar to ordinary data standardization, is a method of unifying scattered data and a method of optimizing neural networks.

2.2 Batch standardization

Not only do you standardize the data before entering it into the model . Data standards should be considered after every transformation of the network

Even if the mean and variance change with time during the training process , it can also standardize the data adaptively, which is the correction mentioned in the paper.
 

2.3 Why do batch standardization

The problem that batch standardization solves is gradient disappearance and gradient explosion .

Batch normalization is a training optimization method about gradient disappearance. Take the sigmoid function as an example. The sigmoid function makes the output in [0,1]
between. 
With the increase in the number of our training, it will tend to 0,1 resulting gradient disappears
 

Benefits of using batch processing:

If the input is large , the corresponding slope is small , the back propagation gradient is small, and the learning rate is very slow.

We know that standardization of data preprocessing can speed up convergence . Similarly, using standardization in neural networks can also speed up convergence, and there are more benefits.

3 Summary:

Batch standardization benefits

  • Has the effect of regularization
  • Improve the generalization ability of the model
  • Allows a higher learning rate to accelerate convergence
Batch normalization helps gradient propagation and therefore allows deeper networks. For some very deep networks, training can only be carried out when it contains multiple BatchNormalization layers.
Expansion:
BatchNormalization is widely used in many high-level built-in Keras
Level convolutional neural network architecture, such as ResNet50, Inception
V3 and Xception.

4 TF realizes batch standardization

The BatchNormalization layer is usually used after the convolutional layer or the densely connected layer .

Tf.keras.layers.Batchnormalization ()
 

4.1 Implementation process:

1. Find the mean of each training batch data
2. Find the variance of each training batch data
3. Data standardization
4. Training parameters γ, β
5. The output y gets the original value through the linear transformation of γ and β

note:
In the forward propagation of training, the current output will not be changed, only γ and β are recorded . In the back propagation, according to the obtained γ and β through the chain derivation method, the learning rate is calculated and the weight is changed.

4.2 Batch standardized forecasting process

When we model training we record the mean and variance in each batch, until the training is completed, we find the mean and variance of expectations throughout the training sample values, our predictions as the mean and variance BN

4.3 Location of batch processing:

The original paper said that it should generally be used before the nonlinear activation function in CNN, but in fact, the effect may be better after the activation function.

4.4 Implementation code:

 

Guess you like

Origin blog.csdn.net/qq_37457202/article/details/107896747
Recommended