Table of contents
1. L1 Loss, Mean Absolute Error (MAE)
2. L2 Loss, Mean Squared Error (MSE)
7. Contrastive Loss/Triplet Loss
The following are some commonly used loss functions, which can be selected and combined according to different application scenarios:
1. L1 Loss, Mean Absolute Error (MAE)
Suitable for regression tasks, the L1 loss calculates the absolute value of the difference between the predicted value and the true value and is less sensitive to outliers.
Among them, is the number of samples, is the true value of the th sample, and is the predicted value of the th sample.
The L1 loss is more suitable for handling outliers because it does not impose an excessive penalty on large errors like the L2 loss.
2. L2 Loss, Mean Squared Error (MSE)
Suitable for regression tasks, the L2 loss calculates the square of the difference between the predicted value and the true value, and is suitable for tasks that output continuous values.
Among them, is the number of samples, is the true value of the th sample, and is the predicted value of the th sample.
The L2 loss often causes the model to try to minimize the sum of squared errors across all samples when making predictions, which can cause the model to be overly sensitive to outliers.
This graph shows how L1 loss (absolute error loss) and L2 loss (mean squared error loss) change with prediction error:
The L1 loss forms a corner where the error is zero and the rest is linear.
The L2 loss is smoother when the error is zero, but as the error increases, the loss increases much faster than the L1 loss.
Therefore, the L1 loss has a relatively small penalty for large errors, while the L2 loss has a more severe penalty for large errors.
3. Cross-Entropy Loss
Suitable for classification tasks. For binary classification problems, binary cross-entropy (Binary Cross-Entropy), also known as logarithmic loss, can be used;
where is the loss function, is the number of samples, is the true label (0 or 1) of the th sample, and is the predicted probability of the th sample.
For multi-classification problems, multi-category cross-entropy (Categorical Cross-Entropy) is used.
4. Combined Losses
In some cases, you may need to combine multiple loss functions. For example, in a multi-task learning scenario, you can use MSE for the output of the regression task and cross-entropy for the output of the classification task.
5、Dice Loss 或 IoU Loss
Commonly used in image segmentation tasks, especially when classes are imbalanced. These loss functions focus on how much the predicted region overlaps with the real region.
6. Adversarial Loss
Common in applications using generative adversarial networks (GANs), such as style transfer or image generation tasks.
7. Contrastive Loss/Triplet Loss
Used in metric learning and certain types of embedding learning, especially in scenarios where relationships between inputs need to be learned.
In practical applications, an appropriate loss function can be selected based on the specific requirements of the task and the output characteristics of the network , and a custom loss function can even be designed to better adapt to specific application scenarios. At the same time, the losses of different outputs can also be weighted to reflect the importance of different tasks.