Various Dice Loss variants

insert image description here

Various Dice Loss variants

Yuque document: https://www.yuque.com/lart/idh721/gpix1i

Dice Loss is also a very common loss function in image segmentation tasks. This article is organized based on the contents of Generalized Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks .

hard dice score for binary segmentation

dice score is a widely used overlap measure for pairwise comparisons between binary segmentation maps S and G.

It can be expressed as a set operation or as a statistical measure:

D hard = 2 ∣ S ∩ G ∣ ∣ S ∣ + ∣ G ∣ = 2 Θ TP 2 Θ TP + Θ FP + Θ FN = 2 Θ TP 2 Θ TP + Θ AE D_{hard}=\frac{2|S \cap G|}{|S|+|G|}=\frac{2\Theta_{TP}}{2\Theta_{TP}+\Theta_{FP}+\Theta_{FN}}=\frac{2 \Theta_{TP}}{2\Theta_{TP}+\Theta_{AE}}Dhard=S+G2∣SG=2 ThTP+ThFP+ThFN2 ThTP=2 ThTP+ThAE2 ThTP

There are several items involved here, the specific meanings are as follows:

  • S & G S \& G S & G : images to be evaluated and reference images
  • Θ TP \Theta_{TP}ThTP: the number of positive and positive samples, ie SSS andGGThe number of positions where G is true.
  • Θ FP\Theta_{FP}ThFP S S True in S and GGThe number of false positions in G.
  • Θ F N \Theta_{FN} ThFN S S S is fake andGGThe number of true positions in G.
  • Θ AE = Θ FP + Θ FN \Theta_{AE} = \Theta_{FP} + \Theta_{FN}ThAE=ThFP+ThFN S S S andGGThe number of positions where G is inconsistent.

soft dice score for binary segmentation

An extension to soft binary segmentation relies on an inconsistent notion of probabilistic class pairs.

for SSS andGGPosition i ∈ X i \in \mathbf{X}in GiThe category S i S_icorresponding to XSiG i G_iGiCan be defined as label space L = { 0 , 1 } \mathbf{L}=\{0,1\}L={ 0,1 } random variable.

Probabilistic segmentation can be represented as a label probability map, where P ( L ) P(\mathbf{L})P ( L ) represents the collection of label probability vectors:

  • p = { p i : = P ( S i = 1 ) } i ∈ X p=\{p^i:=P(S_i=1)\}_{i \in \mathbf{X}} p={ pi:=P(Si=1)}iX
  • g = { g i : = P ( G i = 1 ) } i ∈ X g=\{g^i:=P(G_i=1)\}_{i \in \mathbf{X}} g={ gi:=P(Gi=1)}iX

From this, the previous statistics on the data can be Θ TP & Θ AE \Theta_{TP} \& \Theta_{AE}ThTP& ΘAEExtended to the soft-segmentation case:

  • Θ A E = ∑ i ∈ X ∣ p i − g i ∣ \Theta_{AE}=\sum_{i \in \mathbf{X}} |p^i-g^i| ThAE=iXpigi
  • Θ T P = ∑ i ∈ X g i ( 1 − ∣ p i − g i ∣ ) \Theta_{TP}=\sum_{i \in \mathbf{X}} g^i(1-|p^i-g^i|) ThTP=iXgi(1pigi)

For gg in the general caseg,即 ∀ i ∈ X , g i ∈ { 0 , 1 } \forall i \in \mathbf{X}, g^i \in \{0, 1\} iX,gi{ 0,1 } , at this point:

  • Θ A E = ∑ i ∈ X g i ( 1 − p i ) + ( 1 − g i ) p i = ∑ i ∈ X g i + p i − 2 g i p i \Theta_{AE}=\sum_{i \in \mathbf{X}} g^i(1-p^i)+(1-g^i)p^i=\sum_{i \in \mathbf{X}} g^i+p^i-2g^ip^i ThAE=iXgi(1pi)+(1gi)pi=iXgi+pi2gipi
  • Θ T P = ∑ i ∈ X g i p i \Theta_{TP}=\sum_{i \in \mathbf{X}} g^ip^i ThTP=iXgipi

The corresponding soft dice score can be expressed as:

D s o f t ( p , g ) = 2 ∑ i g i p i ∑ i ( g i + p i ) D_{soft}(p,g)=\frac{2\sum_i g^ip^i}{\sum_i(g^i+p^i)} Dsoft(p,g)=i(gi+pi)2igipi

Of course, there are also variants that introduce squared forms.

soft multi-class dice score

The direct discussion above is the case of binary segmentation, but for the case of multi-classification, it is necessary to consider the integration method of different categories of calculations.

The simplest way is to directly consider the average of all categories.

It can be called mean dice score, which corresponds to ∣ L ∣ |\mathbf{L}|L different categories:

D mean ( p , g ) = 1 ∣ L ∣ ∑ l ∈ L 2 ∑ iglipli ∑ igli + pli D_{mean}(p,g)=\frac{1}{|\mathbf{L}|}\sum_{ l \in \mathbf{L}}\frac{2\sum_{i}g^i_lp^i_l}{\sum_{i}g^i_l+p^i_l}Dmean(p,g)=L1lLigli+pli2iglipli

The generalized form of the above formula can be introduced by introducing category weight parameter wl = 1 ( ∑ igli ) 2 , l ∈ L w_l = \frac{1}{(\sum_{i}g^i_l)^2}, l\in \mathbf {L}wl=(igli)21,lL and get. That is, the above formula is transformed into a weighted average form. This is called a generalized soft multi-class dice score.

Finally it can be expressed as:

D g e n e r a l i s e d ( p , g ) = 2 ∑ l w l ∑ i g l i p l i ∑ l w l ∑ i ( g l i + p l i ) D_{generalised}(p,g)=\frac{2\sum_l w_l \sum_i g^i_lp^i_l}{\sum_l w_l \sum_i (g^i_l+p^i_l)} Dgeneralised(p,g)=lwli(gli+pli)2lwliglipli

soft multi-class wasserstein dice score

In the previous form of dice score, for pip^ipi sumgig^igThe measure of the similarity of i can be regarded as the L1 distance, and the wasserstein distance is introduced here to naturally compare two label probability vectors in a semantically meaningful way .

Here first introduces the wasserstein distance.

wasserstein distance

This is also known as the earth mover's distance. Used to represent a probability vector pptransform p into another probability vector qqThe minimum cost required by q .

For all l , l ′ ∈ L l,l' \in \mathbf{L}l,lL , fromlll moves tol ′ l'l The set of distances is defined aslll andl'l'l distance matrixM l , l ′ M_{l,l'}Ml , l, this matrix is ​​fixed and can be considered known.

This is a way to convert L \mathbf{L}The distance matrix MMon LM (usually also called the ground distance matrix) maps toP ( L ) P(\mathbf{L})The way of distance on P ( L ) , here we use about L \mathbf{L}Prior knowledge of L.

L \mathbf{L}When L is a finite set, for p , q ∈ P ( L ) p,q \in P(\mathbf{L})p,qP ( L ) , both aboutMMThe wasserstein distance of M can be defined as the solution of a linear programming problem.

W M ( p , q ) = min ⁡ T l , l ′ ∑ l , l ′ ∈ L T l , l ′ M l , l ′ subject to  ∀ l ∈ L , ∑ l ′ ∈ L T l , l ′ = p l ,  and  ∀ l ′ ∈ L , ∑ l ∈ L T l , l ′ = q l ′ \begin{align} W^{M}(p,q)&=\min_{T_{l,l'}}\sum_{l,l' \in \mathbf{L}}T_{l,l'}M_{l,l'} \\ \text{subject to } \forall l \in \mathbf{L}, \sum_{l' \in \mathbf{L}}T_{l,l'}&=p_l, \\ \text{ and } \forall l' \in \mathbf{L}, \sum_{l \in \mathbf{L}}T_{l,l'}&=q_{l'} \end{align} WM(p,q)subject to lL,lLTl , l and lL,lLTl , l=Tl , lminl , lLTl , lMl , l=pl,=ql

Here T = ( T l , l ′ ) l , l ′ ∈ LT=(T_{l,l'})_{l,l' \in \mathbf{L}}T=(Tl , l)l , lLis ( p , q ) (p,q)(p,q ) joint probability distribution with boundary distributionppp andqqq

The minimum T ^ \hat{T} of the above formulaT^ is called for the distance matrixMMM inppbetween p andqqThe optimal transmission of q .

An explanation of the wasserstein distance can be read:

soft multi-class wasserstein dice score

Here, the wasserstein distance is used to expand the difference measure between the label probability vector pairs, so as to obtain the following extended form:

  • Θ A E = ∑ i ∈ X W M ( p i , g i ) \Theta_{AE}=\sum_{i \in \mathbf{X}}W^{M}(p^i,g^i) ThAE=iXWM(pi,gi)
  • Θ T P l = ∑ i ∈ X g l i ( M l . b − W M ( p i , g i ) ) , ∀ l ∈ L ∖ { b } \Theta^l_{TP}=\sum_{i \in \mathbf{X}}g^i_l(M_{l.b}-W^M(p^i,g^i)), \forall l \in \mathbf{L} \setminus \{b\} ThTPl=iXgli(Ml.bWM(pi,gi)),lL{ b}

M M M is selected such that the background classbbb is always the furthest case from the other classes.

Θ TP = ∑ i ∈ X α l Θ TP l \Theta_{TP}=\sum_{i \in \mathbf{X}}\alpha_l \Theta^l_{TP}ThTP=iXalThTPl

Here, the statistical results of each category are also combined in a weighted manner.

By choosing α l = WM ( l , b ) = M l , b \alpha_l = W^{M}(l, b) = M_{l,b}al=WM(l,b)=Ml,bto make the background position not right Θ TP \Theta_{TP}ThTPPlay a role.

Finally, about MMThe wasserstein dice score of M can be defined as:

D M ( p , q ) = 2 ∑ l M l , b ∑ i g l i ( M l , b − W M ( p i , g i ) ) 2 ∑ l M l , b ∑ i g l i ( M l , b − W M ( p i , g i ) ) + ∑ i W M ( p i , g i ) D^M(p,q)=\frac{2\sum_lM_{l,b}\sum_ig^i_l(M_{l,b}-W^M(p^i,g^i))}{2\sum_lM_{l,b}\sum_ig^i_l(M_{l,b}-W^M(p^i,g^i))+\sum_iW^M(p^i,g^i)} DM(p,q)=2lMl,bigli(Ml,bWM(pi,gi))+iWM(pi,gi)2lMl,bigli(Ml,bWM(pi,gi))

For the binary case, you can set:

M = [ 0 1 1 0 ] M = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix} M=[0110]

From this there are

W M ( p i , g i ) = ∣ p i − g i ∣ , M l , b → l ≠ b W^M(p^i,g^i)=|p^i-g^i|, M_{l,b} \rightarrow l \ne b WM(pi,gi)=pigi,Ml,bl=b

At this point, the wasserstein dice score degenerates into a soft binary dice score:

D M ( p , q ) = 2 ∑ i g i ( 1 − ∣ p i − g i ∣ ) 2 ∑ i g i ( 1 − ∣ p i − g i ∣ ) + ∑ i ∣ p i − g i ∣ = 2 ∑ i p i g i 2 ∑ i p i g i + ∑ i [ p i ( 1 − g i ) + ( 1 − p i ) g i ] = 2 ∑ i g i p i ∑ i ( g i + p i ) \begin{align} D^M(p,q) & =\frac{2\sum_ig^i(1-|p^i-g^i|)}{2\sum_ig^i(1-|p^i-g^i|)+\sum_i|p^i-g^i|} \\ & =\frac{2\sum_ip^ig^i}{2\sum_ip^ig^i+\sum_i[p^i(1-g^i)+(1-p^i)g^i]} \\ & =\frac{2\sum_ig^ip^i}{\sum_i(g^i+p^i)} \end{align} DM(p,q)=2igi(1pigi)+ipigi2igi(1pigi)=2ipigi+i[pi(1gi)+(1pi)gi]2ipigi=i(gi+pi)2igipi

Previous wasserstein distance-based losses were limited by their computational cost, however, for the segmentation case mainly considered here, a closed-form solution to the optimization problem exists.

For∀ l , l ′ ∈ L \forall l,l' \in \mathbf{L}l,lL , the optimal transmission isT l , l ′ = pligl ′ i T_{l,l'}=p^i_lg^i_{l'}Tl , l=pligli, and thus the wasserstein distance can be simplified to:

WM ( pi , gi ) = ∑ l , l ′ ∈ LM l , l ′ pligl ′ i W^M(p^i,g^i)=\sum_{l,l' \in \mathbf{L}} M_ {l,l'}p^i_lg^i_{l'}WM(pi,gi)=l , lLMl , lpligli

Wasserstein dice loss

MM- basedM can be defined as:

L D M : = 1 − D M L_{D^M} := 1-D^M LDM:=1DM

reference

おすすめ

転載: blog.csdn.net/P_LarT/article/details/127585095