[Deep Learning] Summary of commonly used loss functions in various fields (2024 latest version)

Table of contents

1. L1 Loss, Mean Absolute Error (MAE)

2. L2 Loss, Mean Squared Error (MSE)

3. Cross-Entropy Loss

4. Combined Losses

5、Dice Loss 或 IoU Loss

6. Adversarial Loss

7. Contrastive Loss/Triplet Loss

The following are some commonly used loss functions, which can be selected and combined according to different application scenarios: 

1. L1 Loss, Mean Absolute Error (MAE)

        Suitable for regression tasks, the L1 loss calculates the absolute value of the difference between the predicted value and the true value and is less sensitive to outliers. 

L 1=\frac{1}{N} \sum_{i=1}^N\left|y_i-\hat{y}_i\right|

Among them, Nis the number of samples, y_i is i the true value of the th sample, and \hat{y}_i is  i the predicted value of the th sample.

        The L1 loss  is more suitable for handling outliers because it does not impose an excessive penalty on large errors like the L2 loss.

2. L2 Loss, Mean Squared Error (MSE)

        Suitable for regression tasks, the L2 loss calculates the square of the difference between the predicted value and the true value, and is suitable for tasks that output continuous values.

L 2=\frac{1}{N} \sum_{i=1}^N\left(y_i-\hat{y}_i\right)^2

Among them, Nis the number of samples, y_i is i the true value of the th sample, and \hat{y}_i is  i the predicted value of the th sample.

        The L2 loss  often causes the model to try to minimize the sum of squared errors across all samples when making predictions, which can cause the model to be overly sensitive to outliers.

This graph shows how L1 loss (absolute error loss) and L2 loss (mean squared error loss) change with prediction error:

        The L1 loss  forms a corner where the error is zero and the rest is linear.

        The L2 loss  is smoother when the error is zero, but as the error increases, the loss increases much faster than the L1 loss.

        Therefore, the L1 loss has a relatively small penalty for large errors, while the L2 loss has a more severe penalty for large errors.

3. Cross-Entropy Loss

        Suitable for classification tasks. For binary classification problems, binary cross-entropy (Binary Cross-Entropy), also known as logarithmic loss, can be used;

L(y, \hat{y})=-\frac{1}{N} \sum_{i=1}^N\left[y_i \log \left(\hat{y}_i\right)+\left(1-y_i\right) \log \left(1-\hat{y}_i\right)\right]

where  L is the loss function,  N is the number of samples,  y_i is  i the true label (0 or 1) of the th sample, and \hat{y}_i is  i the predicted probability of the th sample.

        For multi-classification problems, multi-category cross-entropy (Categorical Cross-Entropy) is used.

4. Combined Losses

        In some cases, you may need to combine multiple loss functions. For example, in a multi-task learning scenario, you can use MSE for the output of the regression task and cross-entropy for the output of the classification task.

5、Dice Loss 或 IoU Loss

        Commonly used in image segmentation tasks, especially when classes are imbalanced. These loss functions focus on how much the predicted region overlaps with the real region.

6. Adversarial Loss

        Common in applications using generative adversarial networks (GANs), such as style transfer or image generation tasks.

7. Contrastive Loss/Triplet Loss

        Used in metric learning and certain types of embedding learning, especially in scenarios where relationships between inputs need to be learned.

        In practical applications, an appropriate loss function can be selected based on the specific requirements of the task and the output characteristics of the network , and a custom loss function can even be designed to better adapt to specific application scenarios. At the same time, the losses of different outputs can also be weighted to reflect the importance of different tasks.

Guess you like

Origin blog.csdn.net/Next_SummerAgain/article/details/134922231