Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe reading notes

m ( α ) ≜ ⟨ α , 1 ⟩ = ∑ i α im(\alpha) \triangleq\angle\alpha, 1\rangle=\sum_i \alpha_im ( a )a ,1=iai

OT

Probability vector ( α , β ) ∈ R + N × R + M (\alpha, \beta) \in \mathbb{R}_{+}^N \times \mathbb{R}_{+}^M( a ,b )R+N×R+M, 满足 ∑ i α i = ∑ j β j = 1 \sum_i \alpha_i=\sum_j \beta_j=1 iai=jbj=1

Cost matrix C ∈ RN × M \mathrm{C} \in \mathbb{R}^{N \times M}CRN × M
OT ( α , β ) ≜ inf ⁡ π ⩾ 0 , π 1 = α , π 2 = β ⟨ π , C ⟩ = ∑ i , j π i , j C i , j \mathrm{OT}(\ alpha, \beta) \triangle \inf _{\pi \geqslant 0, \pi_1=\alpha, \pi_2=\beta}\langle\pi, \mathrm{C}\rangle=\sum_{i,j}\ pi_{i, j} \mathrm{C}_{i, j}OT ( α ,b )π 0 , π1= a , p2= binfπ ,C=i,jPii,jCi,j
Let ( π 1 , π 2 ) ≜ ( π 1 , π ⊤ 1 ) \left(\pi_1, \pi_2\right) \triangle\left(\pi \mathbb{1}, \pi^{\top}\mathbb {1}\right)( p1,Pi2)( p 1 ,Pi1)

Csiszár is divergent

entropy function φ : R + → R + \varphi: \mathbb{R}_{+} \rightarrow \mathbb{R}_{+} Phi:R+R+

Satisfies greater than or equal to 0, convex function, lower semi-continuous, ϕ ( 1 ) = 0 \phi\left(1\right)=0ϕ(1)=0

φ ∞ ′ ≜ lim ⁡ x → ∞ φ ( x ) x \varphi_{\infty}^{\prime} \triangleq \lim _{x\rightarrow\infty} \frac{\varphi(x)}{x }Philimxxφ ( x )

Let the system divergences
D ϕ ( μ ∣ ν ) ≜ ∑ ν i > 0 ϕ ( μ i ν i ) ν i + φ ∞ ′ ∑ ν i = 0 μ i \mathrm{D}_{\varphi}(\mu \ mid \nu) \triangle \sum_{\nu_i>0} \varphi\left(\frac{\mu_i}{\nu_i}\right) \nu_i+\varphi_{\infty}^{\prime} \sum_{\nu_i =0} \mu_iDf( mn )ni>0Phi(nimi)ni+Phini=0mi
An example is the KL divergence

φ ( x ) = x log ⁡ x − x + 1 \varphi(x)=x \log x-x+1 φ ( x )=xlogxx+1
K L ( μ ∣ ν ) ≜ ∑ i [ log ⁡ ( μ i ν i ) μ i − μ i + ν i ] \mathrm{KL}(\mu \mid \nu) \triangleq \sum_i\left[\log \left(\frac{\mu_i}{\nu_i}\right) \mu_i-\mu_i+\nu_i\right] KL(μn )i[log(nimi)mimi+ni]

Legendre transform

Let the region I ⊂ RI \subset \mathbb{R}IR, f : I → R f:I\to \mathbb{R} f:IR is a convex function, thenfff的Legendre transform为 f ∗ : I ∗ → R f^*:I^*\to \mathbb{R} f:IR
f ∗ ( x ∗ ) = sup ⁡ x ∈ I ( x ∗ x − f ( x ) ) , x ∗ ∈ I ∗ f^*\left(x^*\right)=\sup_{x\in I}\left(x^*x-f\left(x\right)\right),\quad x^*\in I^* f(x)=xIsup(xxf(x)),xI
其中 I ∗ = { x ∗ ∈ R : f ∗ ( x ∗ ) < ∞ } I^* = \left\{x^*\in \mathbb{R}:f^*\left(x^*\right)<\infty\right\} I={ xR:f(x)<}

Similarly

Defined in the convex set x ⊂ R n \mathbf{x}\subset \mathbb{R}^nxRConvex function f of n : X → R f:X\to \mathbb{R}f:XR,则 f ∗ : X ∗ → R f^*:X^*\to \mathbb{R} f:XR
f ∗ ( x ∗ ) = sup ⁡ x ∈ X ( ⟨ x ∗ , x ⟩ − f ( x ) ) , x ∗ ∈ X ∗ f^*\left(\mathbf{x}^*\right)=\sup_{\mathbf{x}\in X}\left(\langle \mathbf{x}^*,\mathbf{x}\rangle - f\left(x\right)\right), \mathbf{x}^*\in X^* f(x)=xXsup(x,xf(x)),xX
其中 X ∗ = { x ∗ ∈ R n : sup ⁡ x ∈ X ( ⟨ x ∗ , x ⟩ − f ( x ) ) < ∞ } X^*=\left\{\mathbf{x}^* \in \mathbb{R}^n: \sup_{\mathbf{x} \in X}\left(\left\langle \mathbf{x}^*, \mathbf{x}\right\rangle-f(\mathbf{x})\right)<\infty\right\} X={ xRn:supxX(x,xf(x))<}

nature

Separable sum

f ( x 1 , x 2 ) = g ( x 1 ) + h ( x 2 ) f ∗ ( y 1 , y 2 ) = g ∗ ( y 1 ) + h ∗ ( y 2 ) f\left(x_1, x_2\right)=g\left(x_1\right)+h\left(x_2\right) \quad f^*\left(y_1, y_2\right)=g^*\left(y_1\right)+h^*\left(y_2\right) f(x1,x2)=g(x1)+h(x2)f(y1,y2)=g(y1)+h(y2)

zoom

f ( x ) = α g ( x ) f ∗ ( y ) = α g ∗ ( y / α ) f ( x ) = α g ( x / α ) f ∗ ( y ) = α g ∗ ( y ) \begin {array}{cc} f(x)=\alpha g(x) & f^*(y)=\alpha g^*(y / \alpha) \\ f(x)=\alpha g(x / \ alpha) & f^*(y)=\alpha g^*(y)\end{array}f(x)=αg(x)f(x)=αg ( x / α )f(y)=a g (y/α)f(y)=a g(y)

Pan

f ( x ) = g ( x − b ) f ∗ ( y ) = b T y + g ∗ ( y ) f(x)=g(x-b) \quad f^*(y)=b^T y+g^*(y) f(x)=g(xb)f(y)=bTy+g(y)

plus affine function

f ( x ) = g ( x ) + a T x + b f ∗ ( y ) = g ∗ ( y − a ) − b f(x)=g(x)+a^T x+b \quad f^*(y)=g^*(y-a)-b f(x)=g(x)+aTx+bf(y)=g(ya)b

Reversible affine transformation

AAA is a non-singular square matrix
f ( x ) = g ( A x ) f ∗ ( y ) = g ∗ ( A − T y ) f(x)=g(A x) \quad f^*(y)=g^ *\left(A^{-T} y\right)f(x)=g(Ax)f(y)=g(ATy)

Example

indicator function

ι { 1 } ( x ) = { 0 , x = 1 + ∞ , otherwise \iota_{\{1\}}(x)=\begin{cases}0, &x=1\\+\infty, &\text {otherwise}\\\end{cases}i{ 1}(x)={ 0,+,x=1otherwise
ι { 1 } ∗ ( y ) = y \iota _{\{1\}}^*(y)=yi{ 1}(y)=y

KL divergence

f ( μ ) = KL ( μ ∣ ν ) = ∑ i [ log ⁡ ( μ i ν i ) μ i − μ i + ν i ] f\left(\mathbf{\mu}\right)=\mathrm{KL }(\mathbf{\mu} \mid \mathbf{\nu}) = \sum_i\left[\log \left(\frac{\mu_i}{\nu_i}\right) \mu_i-\mu_i+\nu_i\right ]f( m )=KL(μn )=i[log(nimi)mimi+ni]

证明:
g ( y ) = y T μ − ∑ i [ log ⁡ ( μ i ν i ) μ i − μ i + ν i ] g\left(\mathbf{y}\right)= \mathbf{y}^T\mathbf{\mu}-\sum_i\left[\log \left(\frac{\mu_i}{\nu_i}\right) \mu_i-\mu_i+\nu_i\right] g(y)=yT μi[log(nimi)mimi+ni]

∂ g ∂ μ i = yi − log ⁡ μ i ν i = 0 ⇒ μ i = vieyi \frac{\partial g}{\partial \mu_i}=y_i-\log\frac{\mu_i}{\nu_i}= 0\Rightarrow \mu_i=v_ie^{y_i}μig=yilognimi=0mi=vieyi

Form
f ∗ ( y ) = ∑ i ( vieyi − vi ) = ⟨ v , ey − 1 ⟩ f^*\left(\mathbf{y}\right)=\sum_i\left(v_ie^{y_i}-v_i\ right )=\angle\mathbf{v}, e^\mathbf{y}-1\anglef(y)=i(vieyivi)=v,ey1

Example 3

φ ( x ) = x log ⁡ x − x + 1 \varphi(x)=x \log x-x+1 φ ( x )=xlogxx+1
φ ∗ ( y ) = ey − 1 \varphi^*\left(y\right) = e^y -1Phi(y)=ey1

Unbalanced optimal transport

UOT ( α , β ) ≜ inf ⁡ π ⩾ 0 ⟨ π , C ⟩ + ε KL ( π ∣ α ⊗ β ) + D φ 1 ( π 1 ∣ α ) + D φ 2 ( π 2 ∣ β ) \begin{ aligned} & \mathrm{UOT}(\alpha, \beta) \triangleq \inf _{\pi \geqslant 0}\langle\pi, \mathrm{C}\rangle+\varepsilon \mathrm{KL}(\pi \ mid \alpha \otimes \beta) \\ &+\mathrm{D}_{\varphi_1}\left(\pi_1 \mid \alpha\right)+\mathrm{D}_{\varphi_2}\left(\pi_2 \mid \beta\right) \end{aligned}UOT(α,b )π0infπ ,C+e KL ( pab )+DPhi1( p1a )+DPhi2( p2b ).

Remove m ( α ) m\left(\mathbf{\alpha}\right)m( α ) is not necessarily equal tom ( β ) m\left(\mathbf{\beta}\right)m( b )

Duality

UOT ⁡ ( α , β ) = sup ⁡ ( f , g ) F ε ( f , g ) \operatorname{UOT}(\alpha, \beta)=\sup _{(f, g)} \mathcal{F} _{\varepsilon}(f, g)UOT(α,b )=(f,g)supFe(f,g)

其中
F ε ( f , g ) ≜ ⟨ α , − φ 1 ∗ ( − f ) ⟩ + ⟨ β , − φ 2 ∗ ( − g ) ⟩ − ε ⟨ α ⊗ β , e f ⊕ g − C ε − 1 ⟩ \begin{aligned} \mathcal{F}_{\varepsilon}(f, g) \triangleq & \left\langle\alpha,-\varphi_1^*(-f)\right\rangle+\left\langle\beta,-\varphi_2^*(-g)\right\rangle \\ & -\varepsilon\left\langle\alpha \otimes \beta, e^{\frac{f \oplus g-\mathrm{C}}{\varepsilon}}-1\right\rangle \end{aligned} Fε(f,g)α,φ1(f)+β,φ2(g)εαβ,eεfgC1
where φ ∗ \varphi^*Phi is the Legendre transform

φ 1 ∗ ( − f ) ≜ ( φ 1 ∗ ( − f i ) ) i ∈ R N \varphi_1^*(-f) \triangleq \left(\varphi_1^*\left(-f_i\right)\right)_i \in \mathbb{R}^N Phi1(f)( f1(fi))iRN

If ϵ = 0 \epsilon=0ϵ=0 , then the last term becomes the constraintf ⊕ g ≤ C f\oplus g \le CfgC

Sinkhorn algorithm

One way to solve the dual problem of OT/UOT is the Sinkhorn algorithm

D φ = ρ KL D_\varphi= \rho KLDf=When ρ K L , the convergence speed is ( 1 + ϵ ρ ) − 1 \left(1+\frac{\epsilon}{\rho}\right)^{-1}(1+rϵ)1 ,ε ≪ ρ \varepsilon \ll \rhoeWhen ρ , it tends to 1 11

translation invariant formula

φ 1 ( x ) = φ 2 ( x ) = ι { 1 } ( x ) \varphi_1(x) = \varphi_2(x)=\iota_{\{1\}}(x)Phi1(x)=Phi2(x)=i{ 1}( x )
F ε ( f , g ) = ⟨ α , f ⟩ + ⟨ β , g ⟩ − ε ⟨ α ⊗ β , ef ⊕ g − C ε − 1 ⟩ \mathcal{F}_{\varepsilon}( f, g)=\angle\alpha, f\angle+\angle\beta, g\rangle-\varepsilon\left\angle\alpha \otimes \beta, e^{\frac{f \oplus g-\mathrm{C }}{\varepsilon}}-1\right\rangleFe(f,g)=a ,f+b ,ge ab ,eefgC1 ⟩Individual
λ ∈R , F ε ( f + λ , g − λ ) = F ε ( f , g ) + λ m ( α ) − λ m ( β ) = F ε ( f , g ) \lambda \in \mathbb{R}, \mathcal{F}_{\varepsilon}(f+\lambda, g-\lambda)=\mathcal{F}_{\varepsilon}(f, g) +\lambda m\left (\mathbf{\alpha}\right)-\lambda m\left(\mathbf{\beta}\right)=\mathcal{F}_{\varepsilon}(f, g)lR,Fe(f+l ,gl )=Fe(f,g)+λm( a )λm( b )=Fe(f,g)

But it is not true for general UOT

( f ∗ , g ∗ ) \left(f^*,g^*\right) (f,g )F ε \mathcal{F}_{\varepsilon}Fethe optimal solution. If the initial value of the Sinkhorn algorithm is f 0 = f ⋆ + τ , τ ∈ R f_0=f^{\star}+\tau, \tau \in\mathbb{R}f0=f+t ,tR

For D φ = ρ KL D_\varphi=\rho KLDf=ρ K L ,有ft = f ∗ + ( ρ ε + ρ ) 2 t τ f_t = f^* + \left(\frac{\rho}{\varepsilon + \rho}\right)^{2t}\tauft=f+(e + rr)2 tτ , that is, iteration is sensitive to translation, and whenε ≪ ρ \varepsilon \ll \rhoeWhen ρ , the error decreases very slowly

In order to solve this problem, I will submit
H ε ( f ˉ , g ˉ ) ≜ sup ⁡ λ ∈ RF ε ( f ˉ + λ , g ˉ − λ ) \mathcal{H}_{\varepsilon}(\bar{f} , \bar{g}) \triangleq \sup _{\lambda \in \mathbb{R}} \mathcal{F}_{\varepsilon}(\bar{f}+\lambda, \bar{g}-\ lambda)He(fˉ,gˉ)λRsupFe(fˉ+l ,gˉλ )
functionH ε ( f ˉ + λ , g ˉ − λ ) = H ε ( f ˉ , g ˉ ) \mathcal{H}_{\varepsilon}(\bar{f}+\lambda, \bar{ g}-\lambda)=\mathcal{H}_{\valuepsilon}(\bar{f}, \bar{g})He(fˉ+l ,gˉl )=He(fˉ,gˉ)

IndividualUOT ( α , β ) UOT\left(\alpha,\beta\right)UOT( a ,β ) ,F ε ( f , g ) \mathcal{F}_{\varepsilon}(f, g)Fe(f,g ) andH ε ( f ˉ , g ˉ ) \mathcal{H}_{\varepsilon}(\bar{f}, \bar{g})He(fˉ,gˉ) definite function
( f , g ) = ( f ˉ + λ ⋆ ( f ˉ , g ˉ ) , g ˉ − λ ⋆ ( f ˉ , g ˉ ) ) where λ ⋆ ( f ˉ , g ˉ ) ≜ argmax ⁡ λ ∈ RF ε ( f ˉ + λ , g ˉ − λ ). \begin{array}{cl}&(f,g)=\left(\bar{f}+\lambda^{\star}(\bar{f}, \bar{g}), \bar{g} -\lambda^{\star}(\bar{f}, \bar{g})\right) \\\text { where } & \lambda^{\star}(\bar{f}, \bar{g }) \triangle\interpret{\lambda\in \mathbb{R}}{\operatorname{argmax}} \mathcal{F}_{\varepsilon}(\bar{f}+\lambda, \bar{g}- \lambda) . \end{array} where (f,g)=(fˉ+l(fˉ,gˉ),gˉl(fˉ,gˉ))l(fˉ,gˉ)λRargmaxFe(fˉ+l ,gˉl ) .
The author assumes φ ∗ \varphi^*Phi Strictly convex function, thenλ ∗ \lambda^*l** Unique

H ε H_\varepsilonHenature

m ( α ) ≜ ⟨ α , 1 ⟩ = ∑ i α im(\alpha) \triangleq\angle\alpha, 1\rangle=\sum_i \alpha_im ( a )a ,1=iai

λ ⋆ ( f ˉ , g ˉ ) ≜ argmax ⁡ λ ∈ RF ε ( f ˉ + λ , g ˉ − λ ) \lambda^{\star}(\bar{f}, \bar{g}) \triangleq \ underset{\lambda \in \mathbb{R}}{\operatorname{argmax}} \mathcal{F}_{\varepsilon}(\bar{f}+\lambda, \bar{g}-\lambda)l(fˉ,gˉ)λRargmaxFe(fˉ+l ,gˉl )

Property 1

φ 1 ∗ , φ 2 ∗ \varphi_1^*,\varphi_2^ *Phi1,Phi2is a smooth, strictly convex function

Then there is a unique maximum solution λ ⋆ ( f ˉ , g ˉ ) \lambda^{\star}(\bar{f}, \bar{g})l(fˉ,gˉ)

Further, ( α ~ , β ~ ) = ∇ H 0 ( f ˉ , g ˉ ) (\tilde{\alpha}, \tilde{\beta})=\nabla \mathcal{H}_0(\bar{f} , \bar{g})(a~,b~)=H0(fˉ,gˉ) interface~ = ∇ ϕ 1 ∗ ( − f ˉ − λ ∗ ( f ˉ , g ˉ ) ) α , β ~ = ∇ ϕ 2 ∗ ( − g ˉ + λ ⋆ ( f ˉ , g ˉ ) ) β \ tilde {\alpha}=\nabla \varphi_1^*\left(-\bar{f}-\lambda^*(\bar{f}, \bar{g})\right) \alpha, \tilde{\beta } = \nabla \varphi_2^*\left(-\bar{g}+\lambda^{\star}(\bar{f}, \bar{g})\right) \betaa~=φ1(fˉl(fˉ,gˉ))a ,b~=φ2(gˉ+l(fˉ,gˉ))b

And m ( α ~ ) = m ( β ~ ) m(\tilde{\alpha})=m(\tilde{\beta})m(a~)=m(b~)

(Hereφ ∗ \varphi^*Phi is a scalar function, so this∇ φ ∗ \nabla \varphi^*φ should be a diagonal matrix, that is,∇ φ ∗ ( f ) = diag ( ( φ ∗ ) ′ ( f 1 ) ( φ ∗ ) ′ ( f 2 ) ⋮ ( φ ∗ ) ′ ( fn ) ) \nabla\varphi ^*\left(f\right) = \rm{diag}\begin{pmatrix}\left(\varphi^*\right)^{\prime}\left(f_1\right)\\\left(\varphi^ *\right)^{\prime}\left(f_2\right)\\\vdots \\\left(\varphi^*\right)^{\prime}\left(f_n\right)\\\end{pmatrix }φ(f)=diag ( f)(f1)( f)(f2)( f)(fn) )

prove:

For any ( f ˉ , g ˉ ) (\bar{f}, \bar{g})(fˉ,gˉ) ,ifG ε ( λ ) = def. F ε ( f ˉ + λ , g ˉ − λ ) \mathcal{G}_{\varepsilon}(\lambda) \stackrel{\text { def. }}{=} \mathcal{F}_{\varepsilon}(\bar{f}+\lambda, \bar{g}-\lambda)Ge( l )= def. Fe(fˉ+l ,gˉl )

For example[Liero et al., 2015], lim ⁡ x → ∞ ϕ ( x ) = + ∞ \lim\limits_{x\to\infty}\varphi\left(x\right)=+\inftyxlimPhi(x)=+ , 当λ → ± ∞ \lambda \to \pm \inftyl± time,G ε → − ∞ \mathcal{G}_{\varepsilon}\to -\inftyGe , i.e.G ε \mathcal{G}_{\varepsilon}Geis a forced function

Therefore, in R \mathbb{R}R takes the global maximum value

Because φ ∗ \varphi^*Phi is strictly convex, so unique

∑G ε \mathcal{G}_\varepsilonGeThen
G ε d λ = ⟨ α , ∇ ϕ 1 ∗ ( − f ˉ − λ ) ⟩ − ⟨ β , ∇ φ 2 ∗ ( − g ˉ + λ ) ⟩ = 0 ⇒ ⟨ α , ∇ φ 1 ∗ ( − f ˉ − λ ) ⟩ = ⟨ β , ∇ φ 2 ∗ ( − g ˉ + λ ) ⟩ ⇒ ⟨ α ~ , 1 ⟩ = ⟨ β ~ , 1 ⟩ ⇒ m ( α ~ ) = m ( β ~ ) \begin{aligned} &\frac{\rm{d}\mathcal{G}_\varepsilon}{\rm{d}\lambda}=\langle\alpha, \nabla\varphi_1^*\left(-\bar {f}-\lambda\right)\angle-\left\angle\beta, \for example \varphi_2^*(-\bar{g}+\lambda)\right\angle=0\\ &\Rightarrow \angle\ alpha, \nabla\varphi_1^*\left(-\bar{f}-\lambda\right)\rangle = \left\langle\beta, \nabla \varphi_2^*(-\bar{g}+\lambda); \right\rangle\\ &\Rightarrow\angle \tilde{\alpha}, 1\rangle = \angle\tilde{\beta}, 1\rangle\\ &\Rightarrow m(\tilde{\alpha})=m (\tilde{\beta}) \end{aligned}dλdGe=a ,φ1(fˉ)_b ,φ2(gˉ+)_=0a ,φ1(fˉ)_=b ,φ2(gˉ+)_a~,1=b~,1m(a~)=m(b~)

Property 2

φ i = ρ i KL \varphi_i = \rho_iKLPhii=riKL

λ ⋆ ( f ˉ , g ˉ ) = ρ 1 ρ 2 ρ 1 + ρ 2 log ⁡ [ ⟨ α , e − f ˉ ρ 1 ⟩ ⟨ β , e − g ˉ ρ 2 ⟩ ] \lambda^{\star}(\bar{f}, \bar{g})=\frac{\rho_1 \rho_2}{\rho_1+\rho_2} \log \left[\frac{\left\langle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle}{\left\langle\beta, e^{-\frac{\bar{g}}{\rho_2}}\right\rangle}\right] l(fˉ,gˉ)=r1+r2r1r2log b ,er2gˉa ,er1fˉ
prove:
⟨ α , e − f ˉ + λ ⋆ ρ 1 ⟩ = ⟨ β , e − g ˉ − λ ⋆ ρ 2 ⟩ , ⇔ e − λ ⋆ ρ 1 ⟨ α , e − f ˉ ρ 1 ⟩ = e + λ ⋆ ρ 2 ⟨ β , e − g ˉ ρ 2 ⟩ , ⇔ − λ ⋆ ρ 1 + log ⁡ ⟨ α , e − f ˉ ρ 1 ⟩ = λ ⋆ ρ 2 + log ⁡ ⟨ β , e − g ˉ ρ 2 ⟩ , ⇔ λ ⋆ ( 1 ρ 1 + 1 ρ 2 ) = log ⁡ [ ⟨ α , e − f ˉ ρ 1 ⟩ ⟨ β , e − g ˉ ρ 2 ⟩ ] , ⇔ λ ⋆ ( f ˉ , g ˉ ) = ρ 1 ρ 2 ρ 1 + ρ 2 log ⁡ [ ⟨ α , e − f ˉ ρ 1 ⟩ ⟨ β , e − g ˉ ρ 2 ⟩ ] . \begin{aligned} & \left\langle\alpha, e^{-\frac{\bar{f}+\lambda^{\star}}{\rho_1}}\right\rangle=\left\langle\beta , e^{-\frac{\bar{g}-\lambda^{\star}}{\rho_2}}\right\rangle, \\ & \Leftrightarrow e^{-\frac{\lambda^{\star }}{\rho_1}}\left\langle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle=e^{+\frac{\lambda^{\star }}{\rho_2}}\left\langle\beta, e^{-\frac{\bar{g}}{\rho_2}}\right\rangle, \\ & \Leftrightarrow-\frac{\lambda^{ \star}}{\rho_1}+\log \left\langle\alpha,a ,er1fˉ+ l=b ,er2gˉl,er1la ,er1fˉ=e+r2lb ,er2gˉ,r1l+loga ,er1fˉ=r2l+logb ,er2gˉ,l(r11+r21)=log b ,er2gˉa ,er1fˉ ,l(fˉ,gˉ)=r1+r2r1r2log b ,er2gˉa ,er1fˉ .

Property 3

τ 1 = ρ 1 ρ 1 + ρ 2 , τ 2 = ρ 2 ρ 1 + ρ 2 \tau_1 = \frac{ \ rho_1}{\rho_1 + \rho_2}, \tau_2 = \frac{\rho_2}{\ rho_1 + \rho_2}t1=r1+ r2r1, t2=r1+ r2r2

but

H ε ( f ˉ , g ˉ ) = ρ 1 m ( α ) + ρ 2 m ( β ) − ε ⟨ α ⊗ β , ef ˉ ⊕ g ˉ − C ε − 1 ⟩ − ( ρ 1 + ρ 2 ) ( ⟨ α , e − f ˉ ρ 1 ⟩ ) τ 1 ( ⟨ β , e − g ˉ ρ 2 ⟩ ) τ 2 ⋅ \begin{aligned} \mathcal{H}_{\varepsilon}(\bar{f}, \bar{g})=\rho_1 m(\alpha)+\rho_2 m(\beta)-\varepsilon\left\long\alpha \otimes \beta, e^{\frac{\bar{f}\oplus\ bar{g}-\mathrm{C}}{\varepsilon}}-1\right\rangle\\\quad-\left(\rho_1+\rho_2\right)\left(\left\angle\alpha, e^{ -\frac{\bar{f}}{\rho_1}}\right\rangle\right)^{\tau_1}\left(\left\langle\beta, e^{-\frac{\bar{g}} {\rho_2}}\right\rangle\right)^{\tau_2}\cdot\end{aligned}He(fˉ,gˉ)=r1m ( a )+r2m ( b )e ab ,eefˉgˉC1( r1+r2)( a ,er1fˉ)t1( b ,er2gˉ)t2
In particular, when ρ 1 = ρ 2 = ρ , ε = 0 \rho_1=\rho_2=\rho, \varepsilon=0r1=r2=r ,e=0
H 0 ( f ˉ , g ˉ ) = ρ [ m ( α ) + m ( β ) − 2 ⟨ α , e − f ˉ ρ ⟩ ⟨ β , e − g ˉ ρ ⟩ ] \mathcal{H}_0 (\bar{f}, \bar{g})=\rho\left[m(\alpha)+m(\beta)-2 \sqrt{\left\length\alpha, e^{-\frac{\ bar{f}}{\rho}}\right\rangle\left\langle\beta, e^{-\frac{\bar{g}}{\rho}}\right\rangle}\right]H0(fˉ,gˉ)=r[ m ( a )+m ( b )2a ,erfˉb ,ergˉ ]
Proof:

φ i ( x ) = ρ i ( x log ⁡ x − x + 1 ) \varphi_i(x)=\rho_i\left(x \log x-x+1\right) Phii(x)=ri(xlogxx+1)
φ i ∗ ( x ) = ρ i ( e x ρ i − 1 ) \varphi_i^*\left(x\right)=\rho_i\left(e^{\frac{x}{\rho_i}}-1\right) Phii(x)=ri(erix1)

F 0 ( f , g ) = ⟨ α , − ρ 1 ( e − f ˉ ρ 1 − 1 ) ⟩ + ⟨ β , − ρ 2 ( e − g ˉ ρ 2 − 1 ) ⟩ = ρ 1 m ( α ) + ρ 2 m ( β ) − ρ 1 ⟨ α , e − f ˉ ρ 1 ⟩ − ρ 2 ⟨ β , e − g ˉ ρ 2 ⟩ \begin{aligned} \mathcal{F}_{0}(f, g) & =\left\angle\alpha,-\rho_1\left(e^{-\frac{\bar{f}}{\rho_1}}-1\right)\right\rangle+\left\angle\beta ,-\rho_2\left(e^{-\frac{\bar{g}}{\rho_2}}-1\right)\right\rangle \\ & =\rho_1 m(\alpha)+\rho_2 m( \beta)-\rho_1\left\langle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle-\rho_2\left\langle\beta, e^{-\ frac{\bar{g}}{\rho_2}}\right\angle\end{aligned}F0(f,g)=a ,p1(er1fˉ1)+b ,p2(er2gˉ1)=r1m ( a )+r2m ( b )r1a ,er1fˉr2b ,er2gˉ

According to property 2
⟨ α , e − f ˉ + λ ⋆ ( f ˉ , g ˉ ) ρ 1 ⟩ = ⟨ α , e − f ˉ ρ 1 ⟩ ⋅ exp ⁡ ( − ρ 2 ρ 1 + ρ 2 log ⁡ [ ⟨ α , e − f ˉ ρ 1 ⟩ ⟨ β , e − g ˉ ρ 2 ⟩ ] ) = ⟨ α , e − f ˉ ρ 1 ⟩ ⋅ ⟨ α , e − f ˉ ρ 1 ⟩ − ρ 2 ρ 1 + ρ 2 ⋅ ⟨ β , e − g ˉ ρ 2 ⟩ ρ 2 ρ 1 + ρ 2 = ⟨ α , e − f ˉ ρ 1 ⟩ ρ 1 ρ 1 + ρ 2 ⋅ ⟨ β , e − g ˉ ρ 2 ⟩ ρ 2 ρ 1 + ρ 2 \begin{aligned} \left\langle\alpha, e^{-\frac{\bar{f}+\lambda^{\star}(\bar{f}, \bar{g})}{\ rho_1}}\right\rangle & =\left\langle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle \cdot \exp \left(-\frac{\ rho_2}{\rho_1+\rho_2} \log \left[\frac{\left\langle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle}{\left\ langle\beta, e^{-\frac{\bar{g}}{\rho_2}}\right\rangle}\right]\right) \\ & =\left\langle\alpha, e^{-\frac {\bar{f}}{\rho_1}}\right\rangle \cdot\left\langle\alpha,e^{-\frac{\bar{f}}{\rho_1}}\right\rangle^{-\frac{\rho_2}{\rho_1+\rho_2}} \cdot\left\langle\beta, e^{-\frac{\bar{g}}{\rho_2}}\right\rangle^{\frac{\rho_2}{\rho_1+\rho_2}} \\ & =\left\langle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle^{\frac{\rho_1}{\rho_1+\rho_2}} \cdot\left\langle\beta, e^{-\frac{\bar{g}}{\rho_2}}\right\rangle^{\frac{\rho_2}{\rho_1+\rho_2}} \end{aligned}a ,er1fˉ+ l(fˉ,gˉ)=a ,er1fˉexp r1+r2r2log b ,er2gˉa ,er1fˉ =a ,er1fˉa ,er1fˉr1+ r2r2b ,er2gˉr1+ r2r2=a ,er1fˉr1+ r2r1b ,er2gˉr1+ r2r2
Individual⟨
β , e − g ˉ − λ ⋆ ( f ˉ , g ˉ ) ρ 2 ⟩ = ⟨ α , e − f ˉ ρ 1 ⟩ ρ 1 ρ 1 + ρ 2 ⋅ ⟨ β , e − g ˉ ρ ⟩ ρ 2 ρ 1 + ρ 2 \left\langle\beta, e^{-\frac{\bar{g}-\lambda^{\star}(\bar{f}, \bar{g})}{ \rho_2}}\right\rangle = \left\angle\alpha, e^{-\frac{\bar{f}}{\rho_1}}\right\rangle^{\frac{\rho_1}{\rho_1+\ rho_2}} \cdot\left\langle\beta, e^{-\frac{\bar{g}}{\rho_2}}\right\rangle^{\frac{\rho_2}{\rho_1+\rho_2}}b ,er2gˉl(fˉ,gˉ)=a ,er1fˉr1+ r2r1b ,er2gˉr1+ r2r2
Substituting them all back will give you the conclusion

Translation invariant Sinkhorn

Unbalanced Sinkhorn

For any initial value f 0 f_0f0
gt + 1 ( y ) = − approx ⁡ ϕ 1 ∗ ( − Smin ⁡ ε α ( C ( ⋅ , y ) − ft ) ) ft + 1 ( x ) = − approx ⁡ φ 2 ∗ ( − Smin ⁡ ε β ( C ( x , ⋅ ) − gt + 1 ) ) \begin{aligned} & g_{t+1}(y)=-\operatorname{approx}_{\varphi_1^*}\left(-\operatorname{Smin}; _{\varepsilon}^\alpha\left(\mathrm{C}(\cdot, y)-f_t\right)\right) \\ & f_{t+1}(x)=-\operatorname{approx}_ {\varphi_2^*}\left(-\operatorname{Smin}_{\varepsilon}^\beta\left(\mathrm{C}(x, \cdot)-g_{t+1}\right)\right); \end{aligned}gt+1(y)=ca.Phi1(Sminea(C(,y)ft))ft+1(x)=ca.Phi2(Smineb(C(x,)gt+1))
For softmin based on Smin ⁡ ε α ( f ) ≜ − ε log ⁡ ⟨ α , e − f / ε ⟩ \operatorname{Smin}_{\varepsilon}^\alpha(f) \triangleq-\varepsilon \log \left \range\alpha, e^{-f/\varepsilon}\right\rangeSminea(f)eloga ,ef / ε
anisotropic prox为
aprox ⁡ φ ∗ ( x ) ≜ arg ⁡ min ⁡ y ∈ R ε ex − y ε + φ ∗ ( y ) \operatorname{aprox}_{\varphi^*}(x)\ triangle \arg \min _{y \in \mathbb{R}} \varepsilon e^{\frac{xy}{\varepsilon}}+\varphi^*(y)ca.Phi(x)argyRmine eexy+Phi (y)
ifφ = ρ KL \varphi = \rho KLPhi=ρ K L ,forapprox ⁡ φ ∗ ( x ) = ρ ε + ρ x \operatorname{approx}_{\varphi^*}(x)=\frac{\rho}{\varepsilon+\rho} xca.Phi(x)=e + rrx

softmin和aprox ⁡ φ ∗ \operatorname{aprox}_{\varphi^*}ca.Phiat ∥ ⋅ ∥ ∞ \|\cdot\|_\inftyThe following are 1-compression and ( 1 + ε ρ ) − 1 \left(1+\frac{\varepsilon}{\rho}\right)^{-1}(1+re)1 - compressed

TI-Sinkhorn

∑H ε \mathcal{H}_\varepsilonHeSpecify the current in the caseΨ
1
( g ˉ ) ≜ arg ⁡ max ⁡ H ε ( ⋅ , g ˉ ) Ψ 2 ( f ˉ ) ≜ arg ⁡ max ⁡ H ε ( f ˉ , ⋅ ) \Psi_1\left(\bar {g}\right)\triangleq\arg\max \mathcal{H}_{\varepsilon}\left(\cdot,\bar{g}\right)\\ \Psi_2\left(\bar{f}\right )\triangle\arg\max \mathcal{H}_{\varepsilon}\left(\bar{f}, \cdot\right)\\Ps1(gˉ)argmaxHe(,gˉ)Ps2(fˉ)argmaxHe(fˉ,)
TI-Sinkhorn算法为
g ˉ t + 1 = Ψ 2 ( f ˉ t ) , f ˉ t + 1 = Ψ 2 ( g ˉ t + 1 ) \bar{g}_{t+1}=\Psi_2\left(\bar{f}_t\right), \bar{f}_{t+1}=\Psi_2\left(\bar{g}_{t+1}\right) gˉt+1=Ps2(fˉt),fˉt+1=Ps2(gˉt+1)

最后
( f t , g t ) ≜ ( f ˉ t + λ ∗ ( f ˉ t , g ˉ t ) , g ˉ t − λ ∗ ( f ˉ t , g ˉ t ) ) \left(f_t,g_t\right)\triangleq \left(\bar{f}_t + \lambda^*\left(\bar{f}_t,\bar{g}_t\right), \bar{g}_t - \lambda^*\left(\bar{f}_t,\bar{g}_t\right)\right) (ft,gt)(fˉt+l(fˉt,gˉt),gˉtl(fˉt,gˉt))

This algorithm inherits H ε \mathcal{H}_\varepsilonHe的不变性
Ψ 1 ( f ˉ t + μ ) = Ψ 1 ( f ˉ t ) − μ \Psi_1\left(\bar{f}_t+\mu\right)=\Psi_1\left(\bar{f}_t\right)-\mu Ps1(fˉt+m )=Ps1(fˉt)μThat
is, iff ˉ t \bar{f}_tfˉtBecomes f ˉ t + μ \bar{f}_t+\mufˉt+μ , theng ˉ t + 1 \bar{g}_{t+1}gˉt+1Becomes g ˉ t + 1 − μ \bar{g}_{t+1}-\mugˉt+1μ
or iff ˉ t → f ˉ t + μ \bar{f}_t\to \bar{f}_{t}+\mufˉtfˉt+μ,则 g ˉ t + 1 → g ˉ t + 1 − μ \bar{g}_{t+1}\to \bar{g}_{t+1}-\mu gˉt+1gˉt+1m

Property 4: For fixed ( g ˉ , f ˉ ) \left(\bar{g},\bar{f}\right)(gˉ,fˉ)
from^ ≜ Smin ε α ( C ( x , ⋅ ) − f ˉ ) \hat{f}\triangle \text{Smin}_\varepsilon^{\alpha}\left(C\left(x,\cdot \right)-\bar{f}\right)f^Sminea(C(x,)fˉ)
g ^ ≜ Smin ε α ( C ( ⋅ , y ) − g ˉ ) \hat{g}\triangle \text{Smin}_\varepsilon^{\alpha}\left(C\left(\cdot,y\ right)-\bar{g}\right)g^Sminea(C(,y)gˉ)
ψ 1 = def. Ψ 1 ( g ˉ ) , ψ 2 = def. Ψ 2 ( g ˉ ) \psi_1 \stackrel{\text { def. }}{=} \Psi_1(\bar{g}), \psi_2 \stackrel{\text { def. }}{=} \Psi_2(\bar{g})p1= def. Ps1(gˉ),p2= def. Ps2(gˉ)


ψ 1 = − aprox ⁡ φ 2 ∗ ( − g ^ + λ ⋆ ( ψ 1 , g ˉ ) ) − λ ⋆ ( ψ 1 , g ˉ ) , ψ 2 = − aprox ⁡ φ 1 ∗ ( − f ^ − λ ⋆ ( f ˉ , ψ 2 ) ) + λ ⋆ ( f ˉ , ψ 2 ) . \begin{aligned} & \psi_1=-\operatorname{aprox}_{\varphi_2^*}\left(-\hat{g}+\lambda^{\star}\left(\psi_1, \bar{g}\right)\right)-\lambda^{\star}\left(\psi_1, \bar{g}\right), \\ & \psi_2=-\operatorname{aprox}_{\varphi_1^*}\left(-\hat{f}-\lambda^{\star}\left(\bar{f}, \psi_2\right)\right)+\lambda^{\star}\left(\bar{f}, \psi_2\right) . \end{aligned} p1=ca.Phi2(g^+l( p1,gˉ))l( p1,gˉ),p2=ca.Phi1(f^l(fˉ,p2))+l(fˉ,p2).

Definition:
H ε ( f ˉ , g ˉ ) = F ε ( f ˉ + λ ⋆ ( f ˉ , g ˉ ) , g ˉ − λ ⋆ ( f ˉ , g ˉ ) ) \mathcal{H}_{\varepsilon }(\bar{f}, \bar{g})=\mathcal{F}_{\varepsilon}\left(\bar{f}+\lambda^{\star}(\bar{f}, \bar {g}), \bar{g}-\lambda^{\star}(\bar{f}, \bar{g})\right)He(fˉ,gˉ)=Fe(fˉ+l(fˉ,gˉ),gˉl(fˉ,gˉ))

For g ˉ \bar{g}gˉLet
β ⊙ eg ˉ / ε ⟨ α , e ( f ˉ − C ) / ε ⟩ = β ⊙ ∇ φ 2 ∗ ( − g ˉ + λ ⋆ ( f ˉ , g ˉ ) ) eg ˉ / ε ⟨ α , e ( f ˉ − C ) / ε ⟩ = ∇ φ 2 ∗ ( − g ˉ + λ ⋆ ( f ˉ , g ˉ ) ) \begin{aligned} \beta \odot e^{\bar{g} / \varepsilon }\left\angle\alpha, e^{(\bar{f}-\mathrm{C}) / \varepsilon}\right\rangle &=\beta \odot\nabla \varphi_2^*\left(-\bar {g}+\lambda^{\star}(\bar{f}, \bar{g})\right)\\ e^{\bar{g} / \varepsilon}\left\langle\alpha, e^ {(\bar{f}-\mathrm{C}) / \barpsilon}\right\rangle &=\nabla \varphi_2^*\left(-\bar{g}+\lambda^{\star}(\bar {f}, \bar{g})\right) \end{aligned}begˉ/ ea ,e(fˉC ) / εegˉ/ ea ,e(fˉC ) / ε=bφ2(gˉ+l(fˉ,gˉ))=φ2(gˉ+l(fˉ,gˉ))
For f ˉ \bar{f}fˉFor
ˉ / ε ⟨ α , e ( g ˉ − C ) / ε ⟩ = ∇ φ 1 ∗ ( − f ˉ − λ ⋆ ( f ˉ , g ˉ ) ) e^{\bar{f} / \varepsilon }\left\angle\alpha, e^{(\bar{g}-\mathrm{C}) / \varepsilon}\right\rangle=\nabla \varphi_1^*\left(-\bar{f}-\ lambda^{\star}(\bar{f}, \bar{g})\right)efˉ/ ea ,e(gˉC ) / ε=φ1(fˉl(fˉ,gˉ) )
g ^ = g ˉ − λ ⋆ ( f ˉ , g ˉ ) \hat{g}=\bar{g}-\lambda^{\star}(\bar{f}, \bar{g})g^=gˉl(fˉ,gˉ)
eg ^ / ε ⟨ β , e ( f ˉ + λ ⋆ ( f ˉ , g ˉ ) − C ) / ε ⟩ = ∇ φ 2 ∗ ( − g ^ ) e^{\hat{g}/\varepsilon} \left\langle\beta, e^{\left(\bar{f}+\lambda^{\star}(\bar{f}, \bar{g})-\mathrm{C}\right) / \ varepsilon}\right\range=\nabla \varphi_2^*(-\hat{g})eg^/ eb ,e(fˉ+ l(fˉ,gˉ) C ) / ε=φ2(g^)
g ^ = g ˉ − λ ⋆ ( f ˉ , g ˉ ) = − approx ⁡ φ 2 ∗ ( − Smin ⁡ ε α ( C − f ˉ − λ ⋆ ( f ˉ , g ˉ ) ) ) \hat{g }=\bar{g}-\lambda^{\star}(\bar{f}, \bar{g})=-\operatorname{approx}_{\varphi_2^*}\left(-\operatorname{Smin }_{\varepsilon}^\alpha\left(\mathrm{C}-\bar{f}-\lambda^{\star}(\bar{f}, \bar{g})\right)\right)g^=gˉl(fˉ,gˉ)=ca.Phi2(Sminea(Cfˉl(fˉ,gˉ)))

g ˉ = Ψ 1 ( f ˉ ) \bar{g}=\Psi_1(\bar{f})gˉ=Ps1(fˉ) , so the conclusion holds

Insert image description here

Still watching the rest
= =

Guess you like

Origin blog.csdn.net/qq_39942341/article/details/131628570