L1 loss and MSE

---Restore content begins---

Problems encountered during training today

When the loss function is changed from MSE to L1 Loss, Loss has dropped significantly

I used to think that MSE would be relatively better, because the difference between the label and the result is used as a coefficient for derivation, and the greater the difference, the greater the gradient. The L1 Loss gradients are all the same.

I looked it up and saw another statement:

When the predicted value is very different from the target value, the gradient is easy to explode, because the gradient contains x-t. So rgb proposed SmoothL1Loss in Fast RCNN.

 

When the difference is too large, the x−t in the original L2 gradient is replaced by ±1, which avoids the gradient explosion, that is, it is more robust.


This. . . . That must be the reason

---End of recovery content---

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325338741&siteId=291194637