Model training (hyperparameter batch_size/epoch/batch, loss function DiceLoss/CrossEntropy/FocalLoss, optimizer SGD/Adam/Adamw, attenuation strategy step/cos)

Hyperparameters in the model (batch_size, epoch, batch)

Epoch, Batch and Batch size settings in deep learning

epoch:

One epoch refers to training once with all samples in the training set, which is equivalent to batch_size equal to the number of samples in the training set.
If epoch=50, total number of samples=10000, batch_size=20, 500 iterations are required.

iteration:

One iteration is one iteration, that is, training once with batch_size samples. The result of each iteration will be used as the initial value of the next iteration.

batch_size:

The size of each batch of data. Use the SGD optimization algorithm for training, that is, one iteration to train batch_size samples together, calculate their average loss function value, and update the parameters once.

Example:

An excel contains data of 200 samples (data rows). Select batch_size=5, epoch=1000,
then batch=40, each batch has 5 samples, and one epoch will carry out 40 batches or 40 model parameter updates. , 1000 epochs, the model will pass the entire data set 1000 times, and there will be a total of 40000 batches during the entire training process.

Quote examples:

The mnist data set has 60,000 training data and 10,000 test data. Now select Batch_Size = 100 to train the model. Number of images to be trained in each
epoch: 60000 (all images in the training set) Number of batches in the training set: 60000/100=600 Number of
batches to be completed in each epoch: 600 Number of Iterations in each epoch :
600 (Completing a batch is equivalent to one parameter iteration) The number of model weight updates in each epoch: 600 After training for 10 epochs
, the number of model weight updates: 600*10=6000
Training of different epochs, which is practical are data from the same training set. Although the 1st epoch and the 10th epoch both use 60,000 images from the training set, the weight update values ​​for the model are completely different. Because models of different epochs are at different positions in the cost function space, the later the training generation of the model is, the closer it is to the bottom, and the smaller its cost is.
A total of 30,000 iterations are completed, which is equivalent to completing 30,000/600=50 epochs.

The selection principle of batch_size: the larger the batch_size, the smaller the batch, the fewer iterations required, and the shorter the training time.
The smaller the batch_size is, the greater the number of batches will be, which will take a long time and occupy a large amount of computer memory.

loss function

Dice Loss
commonly used loss functions (2): Dice Loss

Complete analysis of CrossEntropy and Focal Loss from Cross Entropy (CE) to Focal Loss (FL)

optimizer

SGD -> SGDM -> NAG ->AdaGrad -> AdaDelta -> Adam -> Nadam -> AdamW

Learning rate decay strategy

Learning rate decay strategies StepLR, ExponentialLR, MultiStepLR and CosineAnnealingLR

Guess you like

Origin blog.csdn.net/smile66688/article/details/129816322