Model.train() and model.eval(), Standardization, Normalization, Dropout, Batch Normalization popular understanding

Table of contents

model.train()与model.eval()

Normalization

Standardization

Batch Normalization

Dropout

Dropout parameter p in TensorFlow and pytorch

relu, sigmiod, tanh activation function

nn.Linear

model.train()与model.eval()

When there is a BN layer (Batch Normalization) or Dropout in the model, there is a difference between the two

need to be in

During training, model.train() ensures that the BN layer uses the mean and variance of each batch of data , and Dropout randomly selects a part of the network connection to train and update parameters

When testing model.eval() , it is guaranteed that BN uses the mean and variance of all training data , and Dropout uses all network connections.

pytorch automatically fixes BN and DropOut, does not take the average, but uses the trained value

Both standardization and normalization operate on a certain feature (a certain column)

For example: three columns of features for each sample (row): height, weight, blood pressure.

Normalization

Scale the value of a column of numerical features (assumed to be i ii column) in the training set to between 0 and 1

When the maximum or minimum value of X is an isolated extreme point, performance will be affected.

Standardization

Scale the value of a column of numerical features (assumed to be the i-th column) in the training set to a state with a mean of 0 and a variance of 1

x_nor = (x-mean(x))/std(x)=input data-data mean)/data standard deviation

Batch Normalization

batch_size: Indicates the number of data (samples) passed to the program for training at a time

Normalize each layer in the middle of the network, and use Batch Normalization Transform to ensure that the feature distribution extracted by each layer will not be destroyed.
The training is for each mini-batch, but the test is often for a single picture, that is, there is no concept of mini-batch. Since the parameters are fixed after the network training is completed, the mean and variance of each batch are unchanged, so the mean and variance of all batches are directly settled. All batch normalizations operate differently during training and testing.

Dropout

In each training batch, by ignoring half of the feature detectors, overfitting can be significantly reduced

During training, the neurons in each hidden layer are multiplied by the probability P of being discarded, and then activated.
In testing, all neurons are activated first, and then the output of each hidden layer neuron is multiplied by the probability of being discarded.

Dropout parameter p in TensorFlow and pytorch

p(keep_prob)The ratio of the number of nodes reserved in TensorFlow

The neurons of this layer (layer) in pytorch p:will be randomly discarded (inactivated) with the possibility of p during each iteration of training, and will not participate in training

#随机生成tensor
a = torch.randn(10,1)
>>> tensor([[ 0.0684],
        [-0.2395],
        [ 0.0785],
        [-0.3815],
        [-0.6080],
        [-0.1690],
        [ 1.0285],
        [ 1.1213],
        [ 0.5261],
        [ 1.1664]])
torch.nn.Dropout(0.5)(a)
>>> tensor([[ 0.0000],  
        [-0.0000],  
        [ 0.0000],  
        [-0.7631],  
        [-0.0000],  
        [-0.0000],  
        [ 0.0000],  
        [ 0.0000],  
        [ 1.0521],  
        [ 2.3328]])

Numerical change: 2.3328=1.1664*2

relu, sigmiod, tanh activation function

In the neural network, the original input and output are linear relationships,

But in reality, many problems are non-linear (for example, housing prices cannot increase linearly with the increase of the house area),

At this time, the linear output of the neural network is passed through the activation function , which makes the original linear relationship become nonlinear , which enhances the performance of the neural network.

nn.Linear

Perform a linear transformation on the input data

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])

Tensor size changed from 128 x 20 to 128 x 30

The actions performed are:

[128,20]×[20,30]=[128,30]

Reference link: Pytorch study notes 11 - usage of model.train() and model.eval(), Dropout principle, relu, sigmiod, tanh activation function, nn.Linear analysis, method of outputting the entire tensor - After the rain Mountain View - Blog Garden

Standardization and Normalization Super Detailed Explanation

Guess you like

Origin blog.csdn.net/qq_28838891/article/details/127721891