insert image description here

Various Dice Loss variants

Yuque document: https://www.yuque.com/lart/idh721/gpix1i

Dice Loss is also a very common loss function in image segmentation tasks. This article is organized based on the contents of Generalized Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks .

hard dice score for binary segmentation

dice score is a widely used overlap measure for pairwise comparisons between binary segmentation maps S and G.

It can be expressed as a set operation or as a statistical measure:

$D_{hard}=\frac{2|S \cap G|}{|S|+|G|}=\frac{2\Theta_{TP}}{2\Theta_{TP}+\Theta_{FP}+\Theta_{FN}}=\frac{2 \Theta_{TP}}{2\Theta_{TP}+\Theta_{AE}}$

There are several items involved here, the specific meanings are as follows:

$\& G$ : images to be evaluated and reference images
$\Theta_{TP}$ : the number of positive and positive samples, ie $S$ andThe number of positions where $G is true.$
$FP\Theta_{FP}$ ：True in $S$ The number of false positions in $G.$
$\Theta_{FN}$ ： $S$ is fake andThe number of true positions in $G.$
$\Theta_{AE} = \Theta_{FP} + \Theta_{FN}$ ： $S$ and $The number of positions where G$ is inconsistent.

soft dice score for binary segmentation

An extension to soft binary segmentation relies on an inconsistent notion of probabilistic class pairs.

for $S$ and $\in \mathbf{X}$ in $G$ $i \in$ $S_i$ corresponding to $X$ $S_{i}$ 和 $G_i$ Can be defined as label space $\mathbf{L}=\{0,1\}$ random variable.

Probabilistic segmentation can be represented as a label probability map, where $P(\mathbf{L})$ represents the collection of label probability vectors:

$p=\{p^i:=P(S_i=1)\}_{i \in \mathbf{X}}$
$g=\{g^i:=P(G_i=1)\}_{i \in \mathbf{X}}$

From this, the previous statistics on the data can be $\Theta_{TP} \& \Theta_{AE}$ Extended to the soft-segmentation case:

$\Theta_{AE}=\sum_{i \in \mathbf{X}} |p^i-g^i|$
$\Theta_{TP}=\sum_{i \in \mathbf{X}} g^i(1-|p^i-g^i|)$

in the general case $g$ ，即 $\forall i \in \mathbf{X}, g^i \in \{0, 1\}$ , at this point:

$\Theta_{AE}=\sum_{i \in \mathbf{X}} g^i(1-p^i)+(1-g^i)p^i=\sum_{i \in \mathbf{X}} g^i+p^i-2g^ip^i$
$\Theta_{TP}=\sum_{i \in \mathbf{X}} g^ip^i$

The corresponding soft dice score can be expressed as:

$D_{soft}(p,g)=\frac{2\sum_i g^ip^i}{\sum_i(g^i+p^i)}$

Of course, there are also variants that introduce squared forms.

soft multi-class dice score

The direct discussion above is the case of binary segmentation, but for the case of multi-classification, it is necessary to consider the integration method of different categories of calculations.

The simplest way is to directly consider the average of all categories.

It can be called mean dice score, which corresponds to $|\mathbf{L}|$ different categories:

$D_{mean}(p,g)=\frac{1}{|\mathbf{L}|}\sum_{ l \in \mathbf{L}}\frac{2\sum_{i}g^i_lp^i_l}{\sum_{i}g^i_l+p^i_l}$

The generalized form of the above formula can be introduced by introducing category weight parameter $w_l = \frac{1}{(\sum_{i}g^i_l)^2}, l\in \mathbf {L}$ and get. That is, the above formula is transformed into a weighted average form. This is called a generalized soft multi-class dice score.

Finally it can be expressed as:

$D_{generalised}(p,g)=\frac{2\sum_l w_l \sum_i g^i_lp^i_l}{\sum_l w_l \sum_i (g^i_l+p^i_l)}$

soft multi-class wasserstein dice score

In the previous form of dice score, for $p^i$ sum $g^i$ The measure of the similarity of i can be regarded as the L1 distance, and the wasserstein distance is introduced here to naturally compare two label probability vectors in a semantically meaningful way $^{.}$

Here first introduces the wasserstein distance.

wasserstein distance

This is also known as the earth mover's distance. Used to represent a probability vector transform $p$ The minimum cost required by $q .$

For all $\in \mathbf{L}$ , from $l$ moves to $l^{'}$ The set of distances is defined as $l$ and $l^{'}$ distance matrix $M_{l,l'}$ , this matrix is fixed and can be considered known.

This is a way to convert $\mathbf{L}$ on $L$ $M$ (usually also called the ground distance matrix) maps to $P(\mathbf{L})$ The way of distance on $P$ $($ $L$ $)$ $\mathbf{L}$ Prior knowledge of $L.$

在 $\mathbf{L}$ When $L$ $\in P(\mathbf{L})$ , both about $The wasserstein distance of M$ can be defined as the solution of a linear programming problem.

$\begin{align} W^{M}(p,q)&=\min_{T_{l,l'}}\sum_{l,l' \in \mathbf{L}}T_{l,l'}M_{l,l'} \\ \text{subject to } \forall l \in \mathbf{L}, \sum_{l' \in \mathbf{L}}T_{l,l'}&=p_l, \\ \text{ and } \forall l' \in \mathbf{L}, \sum_{l \in \mathbf{L}}T_{l,l'}&=q_{l'} \end{align}$

Here $LT=(T_{l,l'})_{l,l' \in \mathbf{L}}$ is $(p, q)$ joint probability distribution with boundary distribution $p$ and $q$ 。

$\hat{T}$ of the above formula $\hat{T}$ is called for the distance matrix $M$ in $between p$ and $The optimal transmission of q$ .

An explanation of the wasserstein distance can be read:

soft multi-class wasserstein dice score

Here, the wasserstein distance is used to expand the difference measure between the label probability vector pairs, so as to obtain the following extended form:

$\Theta_{AE}=\sum_{i \in \mathbf{X}}W^{M}(p^i,g^i)$
$\Theta^l_{TP}=\sum_{i \in \mathbf{X}}g^i_l(M_{l.b}-W^M(p^i,g^i)), \forall l \in \mathbf{L} \setminus \{b\}$

$M$ is selected such that the background class $b$ is always the furthest case from the other classes.

$\Theta_{TP}=\sum_{i \in \mathbf{X}}\alpha_l \Theta^l_{TP}$

Here, the statistical results of each category are also combined in a weighted manner.

By choosing $\alpha_l = W^{M}(l, b) = M_{l,b}$ to make the background position not right $\Theta_{TP}$ Play a role.

Finally, about The wasserstein dice score of $M can be defined as:$

$D^M(p,q)=\frac{2\sum_lM_{l,b}\sum_ig^i_l(M_{l,b}-W^M(p^i,g^i))}{2\sum_lM_{l,b}\sum_ig^i_l(M_{l,b}-W^M(p^i,g^i))+\sum_iW^M(p^i,g^i)}$

For the binary case, you can set:

$\begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}$

From this there are

$W^M(p^i,g^i)=|p^i-g^i|, M_{l,b} \rightarrow l \ne b$

At this point, the wasserstein dice score degenerates into a soft binary dice score:

$\begin{align} D^M(p,q) & =\frac{2\sum_ig^i(1-|p^i-g^i|)}{2\sum_ig^i(1-|p^i-g^i|)+\sum_i|p^i-g^i|} \\ & =\frac{2\sum_ip^ig^i}{2\sum_ip^ig^i+\sum_i[p^i(1-g^i)+(1-p^i)g^i]} \\ & =\frac{2\sum_ig^ip^i}{\sum_i(g^i+p^i)} \end{align}$

Previous wasserstein distance-based losses were limited by their computational cost, however, for the segmentation case mainly considered here, a closed-form solution to the optimization problem exists.

For∀ $\forall l,l' \in \mathbf{L}$ , the optimal transmission is $T_{l,l'}=p^i_lg^i_{l'}$ , and thus the wasserstein distance can be simplified to:

$W^M(p^i,g^i)=\sum_{l,l' \in \mathbf{L}} M_ {l,l'}p^i_lg^i_{l'}$

Wasserstein dice loss

based $M$ can be defined as:

$L_{D^M} := 1-D^M$

reference

Code: https://github.com/LucasFidon/GeneralizedWassersteinDiceLoss/blob/master/generalized_wasserstein_dice_loss/loss.py