Table of contents
Dropout parameter p in TensorFlow and pytorch
relu, sigmiod, tanh activation function
model.train()与model.eval()
When there is a BN layer (Batch Normalization) or Dropout in the model, there is a difference between the two
need to be in
During training, model.train() ensures that the BN layer uses the mean and variance of each batch of data , and Dropout randomly selects a part of the network connection to train and update parameters
When testing model.eval() , it is guaranteed that BN uses the mean and variance of all training data , and Dropout uses all network connections.
pytorch automatically fixes BN and DropOut, does not take the average, but uses the trained value
Both standardization and normalization operate on a certain feature (a certain column)
For example: three columns of features for each sample (row): height, weight, blood pressure.
Normalization
Scale the value of a column of numerical features (assumed to be i ii column) in the training set to between 0 and 1
When the maximum or minimum value of X is an isolated extreme point, performance will be affected.
Standardization
Scale the value of a column of numerical features (assumed to be the i-th column) in the training set to a state with a mean of 0 and a variance of 1
x_nor = (x-mean(x))/std(x)=input data-data mean)/data standard deviation
Batch Normalization
batch_size
: Indicates the number of data (samples) passed to the program for training at a time
Normalize each layer in the middle of the network, and use Batch Normalization Transform to ensure that the feature distribution extracted by each layer will not be destroyed.
The training is for each mini-batch, but the test is often for a single picture, that is, there is no concept of mini-batch. Since the parameters are fixed after the network training is completed, the mean and variance of each batch are unchanged, so the mean and variance of all batches are directly settled. All batch normalizations operate differently during training and testing.
Dropout
In each training batch, by ignoring half of the feature detectors, overfitting can be significantly reduced
During training, the neurons in each hidden layer are multiplied by the probability P of being discarded, and then activated.
In testing, all neurons are activated first, and then the output of each hidden layer neuron is multiplied by the probability of being discarded.
Dropout parameter p in TensorFlow and pytorch
p(keep_prob)
:
The ratio of the number of nodes reserved in TensorFlow
The neurons of this layer (layer) in pytorch p:
will be randomly discarded (inactivated) with the possibility of p during each iteration of training, and will not participate in training
#随机生成tensor
a = torch.randn(10,1)
>>> tensor([[ 0.0684],
[-0.2395],
[ 0.0785],
[-0.3815],
[-0.6080],
[-0.1690],
[ 1.0285],
[ 1.1213],
[ 0.5261],
[ 1.1664]])
torch.nn.Dropout(0.5)(a)
>>> tensor([[ 0.0000],
[-0.0000],
[ 0.0000],
[-0.7631],
[-0.0000],
[-0.0000],
[ 0.0000],
[ 0.0000],
[ 1.0521],
[ 2.3328]])
Numerical change: 2.3328=1.1664*2
relu, sigmiod, tanh activation function
In the neural network, the original input and output are linear relationships,
But in reality, many problems are non-linear (for example, housing prices cannot increase linearly with the increase of the house area),
At this time, the linear output of the neural network is passed through the activation function , which makes the original linear relationship become nonlinear , which enhances the performance of the neural network.
nn.Linear
Perform a linear transformation on the input data
>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])
Tensor size changed from 128 x 20 to 128 x 30
The actions performed are:
[128,20]×[20,30]=[128,30]
Standardization and Normalization Super Detailed Explanation