Image segmentation loss function summary

1 Introduction

For image segmentation, improving the accuracy rate through model optimization has always been the focus of everyone's promotion. The target loss function, as an important part of the algorithm solution, plays an important role in helping the model quickly converge.
Article reference:
image.png
Image segmentation can be defined as a pixel-level classification task. An image is composed of various pixels, which together define different elements in the image, so the method of classifying these pixels into a class of elements is called semantic image segmentation . When designing deep learning architectures based on complex image segmentation, one often encounters a crucial choice which loss/objective function to choose , as they motivate the learning process of the algorithm. The choice of loss function is crucial for any architecture to learn the correct objective.
image.png

This is a summary of image segmentation loss functions. A total of 14 loss functions for segmentation are summarized . and divide them into 4 categories .
Distribution- based loss functions , region-based loss functions , boundary-based loss functions, and compound-based loss functions (Distribution-based, Region-based, Boundary-based, and Compounded).
image.png
Paper address: https://arxiv.org/pdf/2006.14822.pdf
Code address : https://github.com/shruti-jadon/Semantic-Segmentation-Loss-Functions
Project recommendation: https://github.com/JunMa11 /SegLoss

2. Based on distribution

2.1 Binary Cross-Entropy loss function Binary Cross-Entropy

Cross entropy is defined as a measure of the difference between two probability distributions for a given set of random variables or events. It is widely used for classification tasks and works well since segmentation is pixel-level classification . In multi-classification tasks, softmax activation function + cross-entropy loss function is often used, because cross-entropy describes the difference between two probability distributions, but the output of the neural network is a vector, not in the form of a probability distribution. Therefore, the softmax activation function is needed to "normalize" a vector into the form of a probability distribution, and then the cross-entropy loss function is used to calculate the loss.

The cross-entropy loss function evaluates the class prediction for each pixel vector individually and then averages over all pixels, so we can consider the pixels in the image to be learned equally. However, the problem of class imbalance often occurs in medical images , which leads to the training being dominated by classes with more pixels, and it is difficult to learn its features for smaller objects, thereby reducing the effectiveness of the network.
The best usage scenario is that there is uniformly distributed data between different categories, which is a loss function based on Bernoulli distribution.

2.2 Weighted Binary Cross-Entropy Weighted Binary Cross-Entropy

**Weighted Binary Cross Entropy (WCE)** is a variant of the binary cross entropy variable. In this example, the positive examples are weighted by some coefficient. It is widely used in cases of skewed data.
The added weights are used to adjust for false negatives and false positives. If you want to reduce the number of false negatives set the weight greater than 1, similarly if you want to reduce the number of false positives set the weight less than 1. The weight is the coefficient used for positive examples.
Application scenario: widely used in skewed data sets, measured by coefficients

2.3 Balanced Cross-Entropy

Balanced cross entropy (BCE) is similar to weighted cross entropy . The only difference is that in addition to the positive examples, we also weight the negative examples .
Application scenario: Similar to weighted cross entropy, it is widely used in skewed data sets. It weights positive and negative samples respectively.

2.4Focal Loss

Focal loss (FL) can also be viewed as a change in binary cross-entropy . It reduces the contribution of simple examples and enables the model to focus more on learning difficult examples. It is suitable for highly unbalanced class scenarios.
Focus loss proposes to reduce weights to lighten the model and use a modulation coefficient to focus on training hard negative examples . When a sample is misclassified, the modulation coefficient tends to 1, which means there is no big change compared to the original loss. When the classification is correct and the sample is easy to classify, the modulation coefficient tends to 0, which means it contributes little to the total loss.
Application scenario: For highly imbalanced data sets, it most effectively reduces the contribution of simple examples, allowing the model to learn difficult examples.

2.5 Distance map derived loss penalty term Distance map derived loss penalty term

The distance map can be defined as the distance between the ground truth and the prediction map (Euclidean distance, absolute distance, etc.). There are 2 ways to combine maps, one is to create a neural network architecture in which there is a reconstruction head for segmentation, or to introduce it into a loss function. Following the same theory, the distance map derived from the GT mask was created and a custom penalty-based loss function was created. Using this approach, the network can be easily guided to border regions that are difficult to segment. The loss function is defined as:
image.png
Application scenario: Cross entropy variable for difficult-to-segment boundaries

3. Based on region

3.1 Dice Loss

The Dice coefficient is a widely used metric in the computer vision community to calculate the similarity between two images . In 2016, it was also adapted into a loss function called Dice loss.
Dice coefficient : It is a measurement function used to measure the similarity of a set . It is usually used to calculate the similarity between pixels between two samples. The formula is as follows: The
image.png
reason why there is a coefficient 2 in the numerator is because there is a duplicate count x in the denominator and y, the value range is [0.1]. For segmentation tasks, x represents the Ground Truth segmented image, and y represents the predicted segmented image.
Dice Loss:
image.png
Here, 1 is added to the numerator and denominator to ensure the determinism of the function in extreme cases such as The use of Dice Loss is extremely imbalanced in the samples. If Dice Loss is used under normal circumstances, backpropagation will have an adverse effect, making training unstable.
Application scenario: Inspired by the dice coefficient, a metric for evaluating segmentation results. Since the dice coefficient is non-convex in nature, it has been modified to make it easier to handle.

3.2 Tversky Loss

Formula:
image.png
Tversky coefficient is a generalization of Dice coefficient and Jaccard coefficient. When setting α = β = 0.5 \alpha = \beta =0.5a=b=0.5 , at this time the Tversky coefficient is the Dice coefficient. And when settingα = β = 1 \alpha = \beta = 1a=b=When 1 , the Tversky coefficient is the Jaccard coefficient. α , β \alpha , \betaa ,β controls false negatives and false positives respectively. By adjustingα , β \alpha , \betaa ,β can control the balance between false positives and false negatives.
The Tversky index (TI) can also be seen asa generalization of the dice coefficient. It adds a weight to FP(false positives) and FN (false negatives)through the action of coefficients
Use case: A variant of the dice coefficient that addsweight to false positives and false negatives.

3,3 Focal Tversky Loss

Similar to "Focal loss", which focuses on illustrating difficult examples by reducing the weight of easy/common losses. Focal Tversky Loss also tries to use the γ coefficient to learn difficult examples such as in the case of a small ROI (region of interest), as shown below:
image.png
Similar to Focal Loss
Application scenarios: variants of Tversky, focusing on difficult examples

3.4 Sensitivity Specificity Loss

First of all, sensitivity is the recall rate, the ability to detect that there is indeed a disease:

image.png
Specificity, the ability to detect that there is no disease:
image.png
Sensitivity Specificity Loss is:
image.png
the error rate of the diseased pixel on the left is 1-Sensitivity, not the correct rate, so set λ \lambdaλ is 0.05. Where(rn − pn) 2 (r_n - p_n)^2(rnpn)2 is to get a smooth gradient.
Similar to the Dice coefficient, sensitivity and specificity are commonly used metrics to evaluate segmentation predictions. In this loss function, we can use parameters to solve the class imbalance problem.

3.5 Log-Cosh Dice Loss** (loss function proposed in this article)**

Dice coefficient is a metric used to evaluate segmentation output . It has also been modified as a loss function since it enables a mathematical representation of the segmentation objective. But due to its non-convexity, it fails to achieve optimal results many times. Lovsz-softmax loss aims to solve the problem of non-convex loss functions by adding smoothing using Lovsz expansion. Meanwhile, the Log-Cosh method has been widely used in regression-based problems to smooth curves.
It is equivalent to an improvement on the Dice coefficient. Due to the non-convexity of Dice, the optimal result may not be obtained.

4. Based on boundaries

4.1 Shape-aware Loss Shape-aware Loss

Shape-aware Loss As the name suggests, Shape-aware Loss takes shape into account. In general, all loss functions work at the pixel level, but the shape-aware loss computes the average Euclidean distance from points around the predicted segmentation curve to the ground truth point-to-curve and uses this as a coefficient in the cross-entropy loss function .
In cases where boundaries are difficult to segment, the cross-entropy loss is modified by increasing shape-based coefficients.

4.2 **Hausdorff Distance Loss**Hausdorff Distance Loss

Hausdorff Distance Loss (HD) is a metric used by segmentation methods to track model performance. It is defined as:
image.png
The purpose of any segmentation model is to maximize the Hausdorff distance , but due to its non-convexity, it is not widely used as a loss function. Some researchers have proposed three variables of the loss function based on Hausdorff distance, which all combine measurement use cases and ensure that the loss function is easy to handle.
Inspired by the Hausdorff distance metric used to evaluate segmentation loss, we handle the non-convexity of the distance metric by adding some variables

5. Combination loss

5.1 Combo Loss

The combined loss is defined as the weighted sum of Dice loss and modified cross-reduction. It attempts to utilize the flexibility of Dice loss to solve the class imbalance problem while using cross-reduction for curve smoothing. **Defined as: (DL refers to Dice Loss)
image.png
The combined loss is the weighted sum of Dice loss and improved cross-entropy. It attempts to exploit the flexibility of the imbalanced-like dice loss while leveraging cross-entropy for curve smoothing.

5.2 Exponential Logarithmic Loss

The exponential log loss function focuses on predicting less accurate structures using a combined formula of dice loss and cross-entropy loss . Exponential and logarithmic transformations are performed on the dice loss and entropy loss to incorporate the benefits of finer segmentation boundaries and accurate data distribution . It is defined as:
image.png
Application scenario: The combined function of dice loss and binary cross-entropy focuses on situations where prediction accuracy is low.

Reference:
[Loss function for medical image segmentation - blackened pig article - Zhihu]
https://zhuanlan.zhihu.com/p/267128903
[Image segmentation model tuning skills, loss function inventory]
https: //zhuanlan.zhihu.com/p/393496742

Guess you like

Origin blog.csdn.net/Alexa_/article/details/131819586