Research total loss and val loss and pre-training weights

1. Total loss and val loss

loss: The loss value of the training set as a whole .
val loss: The overall loss value of the validation set (test set).
When we are training a model, we will divide a sample into a training set and a validation set. Generally speaking, we divide according to training set: verification set == 9:1, then when we calculate the loss value in the training model, it will be divided into:

The overall total loss of the training set and the val loss of the test set. The relationship between the two is roughly as follows:

当loss下降,val_loss下降:训练正常,最好情况。

当loss下降,val_loss稳定:网络过拟合化。这时候可以添加Dropout和Max pooling。

当loss稳定,val_loss下降:说明数据集有严重问题,可以查看标签文件是否有注释错误,或者是数据集质量太差。建议重新选择。

当loss稳定,val_loss稳定:学习过程遇到瓶颈,需要减小学习率(自适应网络效果不大)或batch数量。

当loss上升,val_loss上升:网络结构设计问题,训练超参数设置不当,数据集需要清洗等问题,最差情况。

Second, the actual verification:

Because there are few data sets made by myself (only 30 training sets and 10 verification sets), there is no gpu at present, so the epoch is only set to 10. So the convergence effect is not good, but it can explain the situation.

 Third, there is a difference between using pre-trained weights and not using them:

4. Loss function: L1 loss, L2 loss, smooth L1 loss

The convergence speed of L2-loss is much faster than that of L1-loss. The disadvantage is that when there are outliers (outliers), these points will account for the main component of loss.

smooth L1: Slightly eases the absolute loss function (loss), which grows linearly with the error instead of quadratically.

The difference between smooth L1 and L1-loss functions is that the derivative of L1-loss at point 0 is not unique, which may affect the convergence. The solution to smooth L1 is to use a square function around 0 to make it smoother.

 "L1 loss that is less sensitive to outliers than the L2 loss used in R-CNN and SPPnet."

That is, the smooth L1 loss makes the loss more robust to outliers, that is, compared to the L2 loss function, it is not sensitive to outliers and outliers, and the gradient changes are relatively small, and it is not easy to run away during training. .

Guess you like

Origin blog.csdn.net/m0_63172128/article/details/129317147