Various Dice Loss variants
Yuque document: https://www.yuque.com/lart/idh721/gpix1i
Dice Loss is also a very common loss function in image segmentation tasks. This article is organized based on the contents of Generalized Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks .
hard dice score for binary segmentation
dice score is a widely used overlap measure for pairwise comparisons between binary segmentation maps S and G.
It can be expressed as a set operation or as a statistical measure:
D hard = 2 ∣ S ∩ G ∣ ∣ S ∣ + ∣ G ∣ = 2 Θ TP 2 Θ TP + Θ FP + Θ FN = 2 Θ TP 2 Θ TP + Θ AE D_{hard}=\frac{2|S \cap G|}{|S|+|G|}=\frac{2\Theta_{TP}}{2\Theta_{TP}+\Theta_{FP}+\Theta_{FN}}=\frac{2 \Theta_{TP}}{2\Theta_{TP}+\Theta_{AE}}Dhard=∣S∣+∣G∣2∣S∩G∣=2 ThTP+ThFP+ThFN2 ThTP=2 ThTP+ThAE2 ThTP
There are several items involved here, the specific meanings are as follows:
- S & G S \& G S & G : images to be evaluated and reference images
- Θ TP \Theta_{TP}ThTP: the number of positive and positive samples, ie SSS andGGThe number of positions where G is true.
- Θ FP\Theta_{FP}ThFP: S S True in S and GGThe number of false positions in G.
- Θ F N \Theta_{FN} ThFN: S S S is fake andGGThe number of true positions in G.
- Θ AE = Θ FP + Θ FN \Theta_{AE} = \Theta_{FP} + \Theta_{FN}ThAE=ThFP+ThFN: S S S andGGThe number of positions where G is inconsistent.
soft dice score for binary segmentation
An extension to soft binary segmentation relies on an inconsistent notion of probabilistic class pairs.
for SSS andGGPosition i ∈ X i \in \mathbf{X}in Gi∈The category S i S_icorresponding to XSi和G i G_iGiCan be defined as label space L = { 0 , 1 } \mathbf{L}=\{0,1\}L={ 0,1 } random variable.
Probabilistic segmentation can be represented as a label probability map, where P ( L ) P(\mathbf{L})P ( L ) represents the collection of label probability vectors:
- p = { p i : = P ( S i = 1 ) } i ∈ X p=\{p^i:=P(S_i=1)\}_{i \in \mathbf{X}} p={ pi:=P(Si=1)}i∈X
- g = { g i : = P ( G i = 1 ) } i ∈ X g=\{g^i:=P(G_i=1)\}_{i \in \mathbf{X}} g={ gi:=P(Gi=1)}i∈X
From this, the previous statistics on the data can be Θ TP & Θ AE \Theta_{TP} \& \Theta_{AE}ThTP& ΘAEExtended to the soft-segmentation case:
- Θ A E = ∑ i ∈ X ∣ p i − g i ∣ \Theta_{AE}=\sum_{i \in \mathbf{X}} |p^i-g^i| ThAE=∑i∈X∣pi−gi∣
- Θ T P = ∑ i ∈ X g i ( 1 − ∣ p i − g i ∣ ) \Theta_{TP}=\sum_{i \in \mathbf{X}} g^i(1-|p^i-g^i|) ThTP=∑i∈Xgi(1−∣pi−gi∣)
For gg in the general caseg,即 ∀ i ∈ X , g i ∈ { 0 , 1 } \forall i \in \mathbf{X}, g^i \in \{0, 1\} ∀i∈X,gi∈{ 0,1 } , at this point:
- Θ A E = ∑ i ∈ X g i ( 1 − p i ) + ( 1 − g i ) p i = ∑ i ∈ X g i + p i − 2 g i p i \Theta_{AE}=\sum_{i \in \mathbf{X}} g^i(1-p^i)+(1-g^i)p^i=\sum_{i \in \mathbf{X}} g^i+p^i-2g^ip^i ThAE=∑i∈Xgi(1−pi)+(1−gi)pi=∑i∈Xgi+pi−2gipi
- Θ T P = ∑ i ∈ X g i p i \Theta_{TP}=\sum_{i \in \mathbf{X}} g^ip^i ThTP=∑i∈Xgipi
The corresponding soft dice score can be expressed as:
D s o f t ( p , g ) = 2 ∑ i g i p i ∑ i ( g i + p i ) D_{soft}(p,g)=\frac{2\sum_i g^ip^i}{\sum_i(g^i+p^i)} Dsoft(p,g)=∑i(gi+pi)2∑igipi
Of course, there are also variants that introduce squared forms.
soft multi-class dice score
The direct discussion above is the case of binary segmentation, but for the case of multi-classification, it is necessary to consider the integration method of different categories of calculations.
The simplest way is to directly consider the average of all categories.
It can be called mean dice score, which corresponds to ∣ L ∣ |\mathbf{L}|∣ L ∣ different categories:
D mean ( p , g ) = 1 ∣ L ∣ ∑ l ∈ L 2 ∑ iglipli ∑ igli + pli D_{mean}(p,g)=\frac{1}{|\mathbf{L}|}\sum_{ l \in \mathbf{L}}\frac{2\sum_{i}g^i_lp^i_l}{\sum_{i}g^i_l+p^i_l}Dmean(p,g)=∣L∣1l∈L∑∑igli+pli2∑iglipli
The generalized form of the above formula can be introduced by introducing category weight parameter wl = 1 ( ∑ igli ) 2 , l ∈ L w_l = \frac{1}{(\sum_{i}g^i_l)^2}, l\in \mathbf {L}wl=(∑igli)21,l∈L and get. That is, the above formula is transformed into a weighted average form. This is called a generalized soft multi-class dice score.
Finally it can be expressed as:
D g e n e r a l i s e d ( p , g ) = 2 ∑ l w l ∑ i g l i p l i ∑ l w l ∑ i ( g l i + p l i ) D_{generalised}(p,g)=\frac{2\sum_l w_l \sum_i g^i_lp^i_l}{\sum_l w_l \sum_i (g^i_l+p^i_l)} Dgeneralised(p,g)=∑lwl∑i(gli+pli)2∑lwl∑iglipli
soft multi-class wasserstein dice score
In the previous form of dice score, for pip^ipi sumgig^igThe measure of the similarity of i can be regarded as the L1 distance, and the wasserstein distance is introduced here to naturally compare two label probability vectors in a semantically meaningful way .
Here first introduces the wasserstein distance.
wasserstein distance
This is also known as the earth mover's distance. Used to represent a probability vector pptransform p into another probability vector qqThe minimum cost required by q .
For all l , l ′ ∈ L l,l' \in \mathbf{L}l,l′∈L , fromlll moves tol ′ l'l′ The set of distances is defined aslll andl'l'l′ distance matrixM l , l ′ M_{l,l'}Ml , l′, this matrix is fixed and can be considered known.
This is a way to convert L \mathbf{L}The distance matrix MMon LM (usually also called the ground distance matrix) maps toP ( L ) P(\mathbf{L})The way of distance on P ( L ) , here we use about L \mathbf{L}Prior knowledge of L.
在L \mathbf{L}When L is a finite set, for p , q ∈ P ( L ) p,q \in P(\mathbf{L})p,q∈P ( L ) , both aboutMMThe wasserstein distance of M can be defined as the solution of a linear programming problem.
W M ( p , q ) = min T l , l ′ ∑ l , l ′ ∈ L T l , l ′ M l , l ′ subject to ∀ l ∈ L , ∑ l ′ ∈ L T l , l ′ = p l , and ∀ l ′ ∈ L , ∑ l ∈ L T l , l ′ = q l ′ \begin{align} W^{M}(p,q)&=\min_{T_{l,l'}}\sum_{l,l' \in \mathbf{L}}T_{l,l'}M_{l,l'} \\ \text{subject to } \forall l \in \mathbf{L}, \sum_{l' \in \mathbf{L}}T_{l,l'}&=p_l, \\ \text{ and } \forall l' \in \mathbf{L}, \sum_{l \in \mathbf{L}}T_{l,l'}&=q_{l'} \end{align} WM(p,q)subject to ∀l∈L,l′∈L∑Tl , l′ and ∀l′∈L,l∈L∑Tl , l′=Tl , l′minl , l′∈L∑Tl , l′Ml , l′=pl,=ql′
Here T = ( T l , l ′ ) l , l ′ ∈ LT=(T_{l,l'})_{l,l' \in \mathbf{L}}T=(Tl , l′)l , l′∈Lis ( p , q ) (p,q)(p,q ) joint probability distribution with boundary distributionppp andqqq。
The minimum T ^ \hat{T} of the above formulaT^ is called for the distance matrixMMM inppbetween p andqqThe optimal transmission of q .
An explanation of the wasserstein distance can be read:
- Wasserstein GAN and the Kantorovich-Rubinstein Duality
- https://chih-sheng-huang821.medium.com/%E9%82%84%E7%9C%8B%E4%B8%8D%E6%87%82wasserstein-distance%E5%97%8E-%E7%9C%8B%E7%9C%8B%E9%80%99%E7%AF%87-b3c33d4b942
soft multi-class wasserstein dice score
Here, the wasserstein distance is used to expand the difference measure between the label probability vector pairs, so as to obtain the following extended form:
- Θ A E = ∑ i ∈ X W M ( p i , g i ) \Theta_{AE}=\sum_{i \in \mathbf{X}}W^{M}(p^i,g^i) ThAE=∑i∈XWM(pi,gi)
- Θ T P l = ∑ i ∈ X g l i ( M l . b − W M ( p i , g i ) ) , ∀ l ∈ L ∖ { b } \Theta^l_{TP}=\sum_{i \in \mathbf{X}}g^i_l(M_{l.b}-W^M(p^i,g^i)), \forall l \in \mathbf{L} \setminus \{b\} ThTPl=∑i∈Xgli(Ml.b−WM(pi,gi)),∀l∈L∖{ b}
M M M is selected such that the background classbbb is always the furthest case from the other classes.
Θ TP = ∑ i ∈ X α l Θ TP l \Theta_{TP}=\sum_{i \in \mathbf{X}}\alpha_l \Theta^l_{TP}ThTP=∑i∈XalThTPl
Here, the statistical results of each category are also combined in a weighted manner.
By choosing α l = WM ( l , b ) = M l , b \alpha_l = W^{M}(l, b) = M_{l,b}al=WM(l,b)=Ml,bto make the background position not right Θ TP \Theta_{TP}ThTPPlay a role.
Finally, about MMThe wasserstein dice score of M can be defined as:
D M ( p , q ) = 2 ∑ l M l , b ∑ i g l i ( M l , b − W M ( p i , g i ) ) 2 ∑ l M l , b ∑ i g l i ( M l , b − W M ( p i , g i ) ) + ∑ i W M ( p i , g i ) D^M(p,q)=\frac{2\sum_lM_{l,b}\sum_ig^i_l(M_{l,b}-W^M(p^i,g^i))}{2\sum_lM_{l,b}\sum_ig^i_l(M_{l,b}-W^M(p^i,g^i))+\sum_iW^M(p^i,g^i)} DM(p,q)=2∑lMl,b∑igli(Ml,b−WM(pi,gi))+∑iWM(pi,gi)2∑lMl,b∑igli(Ml,b−WM(pi,gi))
For the binary case, you can set:
M = [ 0 1 1 0 ] M = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix} M=[0110]
From this there are
W M ( p i , g i ) = ∣ p i − g i ∣ , M l , b → l ≠ b W^M(p^i,g^i)=|p^i-g^i|, M_{l,b} \rightarrow l \ne b WM(pi,gi)=∣pi−gi∣,Ml,b→l=b
At this point, the wasserstein dice score degenerates into a soft binary dice score:
D M ( p , q ) = 2 ∑ i g i ( 1 − ∣ p i − g i ∣ ) 2 ∑ i g i ( 1 − ∣ p i − g i ∣ ) + ∑ i ∣ p i − g i ∣ = 2 ∑ i p i g i 2 ∑ i p i g i + ∑ i [ p i ( 1 − g i ) + ( 1 − p i ) g i ] = 2 ∑ i g i p i ∑ i ( g i + p i ) \begin{align} D^M(p,q) & =\frac{2\sum_ig^i(1-|p^i-g^i|)}{2\sum_ig^i(1-|p^i-g^i|)+\sum_i|p^i-g^i|} \\ & =\frac{2\sum_ip^ig^i}{2\sum_ip^ig^i+\sum_i[p^i(1-g^i)+(1-p^i)g^i]} \\ & =\frac{2\sum_ig^ip^i}{\sum_i(g^i+p^i)} \end{align} DM(p,q)=2∑igi(1−∣pi−gi∣)+∑i∣pi−gi∣2∑igi(1−∣pi−gi∣)=2∑ipigi+∑i[pi(1−gi)+(1−pi)gi]2∑ipigi=∑i(gi+pi)2∑igipi
Previous wasserstein distance-based losses were limited by their computational cost, however, for the segmentation case mainly considered here, a closed-form solution to the optimization problem exists.
For∀ l , l ′ ∈ L \forall l,l' \in \mathbf{L}∀l,l′∈L , the optimal transmission isT l , l ′ = pligl ′ i T_{l,l'}=p^i_lg^i_{l'}Tl , l′=pligl′i, and thus the wasserstein distance can be simplified to:
WM ( pi , gi ) = ∑ l , l ′ ∈ LM l , l ′ pligl ′ i W^M(p^i,g^i)=\sum_{l,l' \in \mathbf{L}} M_ {l,l'}p^i_lg^i_{l'}WM(pi,gi)=l , l′∈L∑Ml , l′pligl′i
Wasserstein dice loss
MM- basedM can be defined as:
L D M : = 1 − D M L_{D^M} := 1-D^M LDM:=1−DM