---Restore content begins---
Problems encountered during training today
When the loss function is changed from MSE to L1 Loss, Loss has dropped significantly
I used to think that MSE would be relatively better, because the difference between the label and the result is used as a coefficient for derivation, and the greater the difference, the greater the gradient. The L1 Loss gradients are all the same.
I looked it up and saw another statement:
When the predicted value is very different from the target value, the gradient is easy to explode, because the gradient contains x-t. So rgb proposed SmoothL1Loss in Fast RCNN.
When the difference is too large, the x−t in the original L2 gradient is replaced by ±1, which avoids the gradient explosion, that is, it is more robust.
This. . . . That must be the reason
---End of recovery content---