https://github.com/dfdazac/wassdistance/tree/master
prerequisite knowledge
Computational optimal transport learning.
Specifically, you can see that the coordinates of the entropy duality rise.
LC ε (a, b) = def. min P ∈ U ( a , b ) ⟨ P , C ⟩ − ε H ( P ) \mathrm{L}_{\mathbf{C}}^{\varepsilon}(\mathbf{a}, \mathbf{b }) \stackrel{\text { def. }}{=} \min _{\mathbf{P} \in \mathbf{U}(\mathbf{a}, \mathbf{b})}\angle\mathbf{P}, \mathbf{C}\rangle -\varepsilon\mathbf{H}(\mathbf{P})LCe(a,b)= def. P∈U(a,b)min⟨P,C⟩−ε H ( P )
U ( a , b ) = def. { P ∈ R + n × m : P 1 m = a and PT 1 n = b } \mathbf{U}(\mathbf{a}, \mathbf{b}) \stackrel{\text { def. }}{=}\left\{\mathbf{P}\in \mathbb{R}_{+}^{n\times m}: \mathbf{P}\mathbf{1}_m=\mathbf{a} \quad \text { and } \quad \mathbf{P}^{\mathrm{T}} \mathbf{1}_n=\mathbf{b}\right\}U(a,b)= def. {
P∈R+n×m:P1 _m=a and PT 1n=b}
Let
LC ε ( a , b ) = max f ∈ R n , g ∈ R m ⟨ f , a ⟩ + ⟨ g , b ⟩ − ε ⟨ ef / ε , K eg / ε ⟩ \mathrm{L}_{ \mathbf{C}}^{\varepsilon}(\mathbf{a}, \mathbf{b})=\max _{\mathbf{f} \in \mathbb{R}^n, \mathbf{g}\ in \mathbb{R}^m}\angle\mathbf{f}, \mathbf{a}\angle+\angle\mathbf{g}, \mathbf{b}\angle-\varepsilon\left\angle e^{\ mathbf{f}/\objectpsilon}, \mathbf{K}e^{\mathbf{g}/\objectepsilon}\right\rangleLCe(a,b)=f∈Rn,g∈Rmmax⟨f,a⟩+⟨g,b⟩−e⟨ef / e ,K eg / ε ⟩
( u , v ) = ( ef / ε , eg / ε ) (\mathbf{u}, \mathbf{v})=\left(e^{\mathbf{f} / \varepsilon}, e ^{\mathbf{g} / \varepsilon}\right)(u,v)=(ef / e ,eg / e )
P = diag ( u ) K diag ( v ) , K = exp ( − C ϵ ) \mathbf{P}=\rm{diag}\left(\mathbf{u}\right)\mathbf{K}\rm{ diag}\left(\mathbf{v}\right),\quad\mathbf{K}=exp\left(-\frac{C}{\epsilon}\right)P=diag(u)K diag(v),K=exp(−ϵC)
Let
f ( l + 1 ) = ε log a − ε log ( K eg ( l ) / ε ) , g ( l + 1 ) = ε log b − ε log ( KT ef ( l + 1 ) . / ε ) \begin{aligned} \mathbf{f}^{(\ell+1)} & =\itempsilon \log \mathbf{a}-\itempsilon \log \left(\mathbf{K} e^{\mathbf{g }^{(\ell)}/\itempsilon}\right), \\\mathbf{g}^{(\ell+1)} & =\itempsilon \log \mathbf{b}-\itempsilon\log\left (\mathbf{K}^{\mathrm{T}} e^{\mathbf{f}^{(\ell+1)} / \varepsilon}\right) \end{aligned}f(ℓ+1)g(ℓ+1)=eloga−elog( E.g _g( ℓ ) /e),=elogb−elog(KThat's itf( ℓ + 1 ) /e).
There are some changes in the code
Let C ∈ R n × m , f ∈ R n , g ∈ R m \mathbf{C}\in\mathbb{R}^{n\times m}, \mathbf{f}\in\mathbb{R}^ n, \mathbf{g}\in\mathbb{R}^mC∈Rn×m,f∈Rn,g∈Rm
log ( K e g / ε ) = log ( [ ∑ j e − C i , j − g j ε ] i ) = log ( [ ∑ j e − C i , j − g j ε e f i ε e − f i ε ] i ) = log ( [ ∑ j e − C i , j − f i − g j ε ] i ⊙ e − f ε ) = log ( [ ∑ j e − C i , j − f i − g j ε ] i ) − f ε = logsumexp ( − C − f T − g ε , d i m = − 1 ) − f ε \begin{aligned} &\log \left(\mathbf{K} e^{\mathbf{g} / \varepsilon}\right)\\ =&\log\left(\left[\sum_{j}e^{-\frac{C_{i,j}-g_j}{\varepsilon}}\right]_i\right)\\ =&\log\left(\left[\sum_{j}e^{-\frac{C_{i,j}-g_j}{\varepsilon}}e^{\frac{f_i}{\varepsilon}}e^{-\frac{f_i}{\varepsilon}}\right]_i\right)\\ =&\log\left(\left[\sum_{j}e^{-\frac{C_{i,j}-f_i-g_j}{\varepsilon}}\right]_i\odot e^{-\frac{\mathbf{f}}{\varepsilon}}\right)\\ =&\log\left(\left[\sum_{j}e^{-\frac{C_{i,j}-f_i-g_j}{\varepsilon}}\right]_i\right)-\frac{\mathbf{f}}{\varepsilon}\\ =&\operatorname{logsumexp}\left(-\frac{\ mathbf{C}-\mathbf{f}^T-\mathbf{g}}{\valuepsilon},dim=-1\right)-\frac{\mathbf{f}}{\valuepsilon}\\ \end{ aligned}=====log( E.g _g / e )log
[j∑e−eCi,j−gj]i
log
[j∑e−eCi,j−gjeefie−efi]i
log
[j∑e−eCi,j−fi−gj]i⊙e−ef
log
[j∑e−eCi,j−fi−gj]i
−eflogsumexp(−eC−fT−g,dim=−1)−ef
The last step, vector and matrix addition involves the broadcast mechanism
log ( K T e f / ε ) = log ( [ ∑ i e − C i , j − f i ε ] j ) = log ( [ ∑ i e − C i , j − f i ε e g j ε e − g j ε ] j ) = log ( [ ∑ i e − C i , j − f i − g j ε ] j ⊙ e − g ε ) = log ( [ ∑ i e − C i , j − f i − g j ε ] j ) − g ε = logsumexp ( − C − f T − g ε , d i m = − 2 ) − g ε = logsumexp ( − ( C − f T − g ) T ε , d i m = − 1 ) − g ε \begin{aligned} &\log \left(\mathbf{K}^{\mathrm{T}} e^{\mathbf{f} / \varepsilon}\right)\\ =&\log\left(\left[\sum_{i}e^{-\frac{C_{i,j}-f_i}{\varepsilon}}\right]_j\right)\\ =&\log\left(\left[\sum_{i}e^{-\frac{C_{i,j}-f_i}{\varepsilon}}e^{\frac{g_j}{\varepsilon}}e^{-\frac{g_j}{\varepsilon}}\right]_j\right)\\ =&\log\left(\left[\sum_{i}e^{-\frac{C_{i,j}-f_i-g_j}{\varepsilon}}\right]_j\odot e^{-\frac{\mathbf{g}}{\varepsilon}}\right)\\ =&\log\left(\left[\sum_{i}e^{-\frac{C_{i,j}-f_i-g_j}{\varepsilon}}\right]_j\right)-\frac{\mathbf{g}}{\varepsilon}\\ =&\operatorname{logsumexp}\left(-\frac{\ mathbf{C}-\mathbf{f}^T-\mathbf{g}}{\valuepsilon},dim=-2\right)-\frac{\mathbf{g}}{\valuepsilon}\\ =&\ operatorname{logsumexp}\left(-\frac{\left(\mathbf{C}-\mathbf{f}^T-\mathbf{g}\right)^T}{\varepsilon},dim=-1\right )-\frac{\mathbf{g}}{\valuepsilon}\\ \end{aligned}======log(KThat's itf / e )log
[i∑e−eCi,j−fi]j
log
[i∑e−eCi,j−fieegje−egj]j
log
[i∑e−eCi,j−fi−gj]j⊙e−eg
log
[i∑e−eCi,j−fi−gj]j
−eglogsumexp(−eC−fT−g,dim=−2)−eglogsumexp(−e(C−fT−g)T,dim=−1)−eg