机器学习-白板推导系列笔记(二十八)-BM

此文章主要是结合哔站shuhuai008大佬的白板推导视频:玻尔兹曼机_147min

全部笔记的汇总贴:机器学习-白板推导系列笔记

参考花书20.1

一、介绍

玻尔兹曼机连接的每个节点都是离散的二值分布,是全连接的,是为了解决局部最小值的问题而提出的玻尔兹曼机。

在这里插入图片描述

v = { 0 , 1 } D            h = { 0 , 1 } P L = [ L i j ] D ∗ D J = [ J i j ] P ∗ P W = [ W i j ] D ∗ P v=\{0,1\}^D\;\;\;\;\;h=\{0,1\}^P\\L=\Big[L_{ij}\Big]_{D*D}\\J=\Big[J_{ij}\Big]_{P*P}\\W=\Big[W_{ij}\Big]_{D*P} v={ 0,1}Dh={ 0,1}PL=[Lij]DDJ=[Jij]PPW=[Wij]DP

{ p ( v , h ) = 1 Z exp ⁡ { − E ( v , h ) }                                      E ( v , h ) = − ( v T W h + 1 2 v T L v + 1 2 h T J h ) θ = { W , L , J } \left\{\begin{matrix} p(v,h)= \frac1Z\exp\{-E(v,h)\}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\\E(v,h)=-(v^TWh+\frac12v^TLv+\frac12h^TJh)\end{matrix}\right.\\\theta=\{W,L,J\} { p(v,h)=Z1exp{ E(v,h)}E(v,h)=(vTWh+21vTLv+21hTJh)θ={ W,L,J}

二、Log似然的梯度

样本集合: V ,    ∣ V ∣ = N V,\;|V|=N V,V=N

P ( v ) = ∑ h p ( v , h ) 1 N ∑ v ∈ V log ⁡ P ( v ) ←      l o g − l i k e l i h o o d ∂ ∂ θ 1 N ∑ v ∈ V log ⁡ P ( v ) = 1 N ∑ v ∈ V ∂ log ⁡ P ( v ) ∂ θ ← g r a d i e n t    o f    l o g − l i k e l i h o o d P(v)=\sum_hp(v,h)\\\frac1N\sum_{v\in V}\log P(v)\leftarrow\;\;log-likelihood\\\frac\partial {\partial \theta}\frac1N\sum_{v\in V}\log P(v)=\frac1N\sum_{v\in V}{\color{blue}\frac{\partial\log P(v)} {\partial \theta}}\leftarrow gradient\;of\;log-likelihood P(v)=hp(v,h)N1vVlogP(v)loglikelihoodθN1vVlogP(v)=N1vVθlogP(v)gradientofloglikelihood

∂ log ⁡ P ( v ) ∂ θ = ∑ v ∑ h p ( v , h ) ⋅ ∂ E ( v , h ) ∂ θ − ∑ h p ( h ∣ v ) ⋅ ∂ E ( v , h ) ∂ θ \frac{\partial\log P(v)} {\partial \theta}=\sum_v\sum_h p(v,h)\cdot\frac{\partial E(v,h)}{\partial \theta}-\sum_hp(h|v)\cdot\frac{\partial E(v,h)}{\partial \theta} θlogP(v)=vhp(v,h)θE(v,h)hp(hv)θE(v,h)
∂ log ⁡ P ( v ) ∂ W = ∑ v ∑ h p ( v , h ) ⋅ ( − v h T ) − ∑ h p ( h ∣ v ) ⋅ ( − v h T ) = ∑ h p ( h ∣ v ) ⋅ v h T − ∑ v ∑ h p ( v , h ) ⋅ v h T \frac{\partial\log P(v)} {\partial W}=\sum_v\sum_h p(v,h)\cdot(-vh^T)-\sum_hp(h|v)\cdot(-vh^T)\\=\sum_hp(h|v)\cdot vh^T-\sum_v\sum_h p(v,h)\cdot vh^T WlogP(v)=vhp(v,h)(vhT)hp(hv)(vhT)=hp(hv)vhTvhp(v,h)vhT

所以,

1 N ∑ v ∈ V ∂ log ⁡ P ( v ) ∂ θ = 1 N ∑ v ∈ V ∑ h p ( h ∣ v ) ⋅ v h T − 1 N ∑ v ∈ V ∑ v ∑ h p ( v , h ) ⋅ v h T = 1 N ∑ v ∈ V ∑ h p ( h ∣ v ) ⋅ v h T − ∑ v ∑ h p ( v , h ) ⋅ v h T = E P D a t a [ v h T ] − E P m o d e l [ v h T ] \frac1N\sum_{v\in V}{\frac{\partial\log P(v)} {\partial \theta}}=\frac1N\sum_{v\in V}\sum_hp(h|v)\cdot vh^T-\frac1N\sum_{v\in V}\sum_v\sum_h p(v,h)\cdot vh^T\\=\frac1N\sum_{v\in V}\sum_hp(h|v)\cdot vh^T-\sum_v\sum_h p(v,h)\cdot vh^T\\=E_{P_{Data}}\Big[vh^T\Big]-E_{P_{model}}\Big[vh^T\Big] N1vVθlogP(v)=N1vVhp(hv)vhTN1vVvhp(v,h)vhT=N1vVhp(hv)vhTvhp(v,h)vhT=EPData[vhT]EPmodel[vhT]
P D a t a = P D a t a ( v ) P m o d e l ( h ∣ v ) P m o d e l = P m o d e l ( h , v ) = P m o d e l ( v ) P m o d e l ( h ∣ v ) P_{Data}=P_{Data}(v)P_{model}(h|v)\\P_{model}=P_{model}(h,v)=P_{model}(v)P_{model}(h|v) PData=PData(v)Pmodel(hv)Pmodel=Pmodel(h,v)=Pmodel(v)Pmodel(hv)

三、基于MCMC的随机梯度上升

由上述推导,同理可得:

Δ W = ∂ ( E P D a t a [ v h T ] − E P m o d e l [ v h T ] ) \Delta W=\partial\Bigg(E_{P_{Data}}\Big[vh^T\Big]-E_{P_{model}}\Big[vh^T\Big]\Bigg) ΔW=(EPData[vhT]EPmodel[vhT])
Δ L = ∂ ( E P D a t a [ v v T ] − E P m o d e l [ v v T ] ) \Delta L=\partial\Bigg(E_{P_{Data}}\Big[vv^T\Big]-E_{P_{model}}\Big[vv^T\Big]\Bigg) ΔL=(EPData[vvT]EPmodel[vvT])
Δ J = ∂ ( E P D a t a [ h h T ] − E P m o d e l [ h h T ] ) \Delta J=\partial\Bigg(E_{P_{Data}}\Big[hh^T\Big]-E_{P_{model}}\Big[hh^T\Big]\Bigg) ΔJ=(EPData[hhT]EPmodel[hhT])

P D a t a = P D a t a ( v ) P m o d e l ( h ∣ v ) P m o d e l = P m o d e l ( h , v ) = P m o d e l ( v ) P m o d e l ( h ∣ v ) P_{Data}=P_{Data}(v)P_{model}(h|v)\\P_{model}=P_{model}(h,v)=P_{model}(v)P_{model}(h|v) PData=PData(v)Pmodel(hv)Pmodel=Pmodel(h,v)=Pmodel(v)Pmodel(hv)

W ( t + 1 ) = W ( t ) + Δ W W^{(t+1)}=W^{(t)}+\Delta W W(t+1)=W(t)+ΔW

Δ w i j = ∂ ( E P D a t a [ v i h j ] ⏟ p o s i t i v e    p h a s e − E P m o d e l [ v i h j ] ⏟ n e g a t i v e    p h a s e ) \Delta w_{ij}=\partial\Bigg(\underset{positive\;phase}{\underbrace{E_{P_{Data}}\Big[v_ih_j\Big]}}-\underset{negative\;phase}{\underbrace{E_{P_{model}}\Big[v_ih_j\Big]}}\Bigg) Δwij=(positivephase EPData[vihj]negativephase EPmodel[vihj])
但是无论是正向还是负向都是难以处理的,是intractable的。

p ( v i = 1 ∣ h , v − i ) = σ ( ∑ j = 1 P w i j h j + ∑ k = 1 / i D L i k v k ) p ( h i = 1 ∣ v , h − i ) = σ ( ∑ j = 1 D w i j v j + ∑ m = 1 / i P J i m h m ) p(v_i=1|h,v_{-i})=\sigma(\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^DL_{ik}v_k)\\p(h_i=1|v,h_{-i})=\sigma(\sum_{j=1}^Dw_{ij}v_j+\sum_{m=1/i}^PJ_{im}h_m) p(vi=1h,vi)=σ(j=1Pwijhj+k=1/iDLikvk)p(hi=1v,hi)=σ(j=1Dwijvj+m=1/iPJimhm)

RBM:(如下图)
p ( h ∣ v ) = ∏ j = 1 3 p ( h j ∣ v ) p ( h j = 1 ∣ v ) = p ( h j = 1 ∣ v , h − j ) = σ ( ∑ i = 1 P W i j v i ) p(h|v)=\prod_{j=1}^3p(h_j|v)\\p(h_{j=1}|v)=p(h_{j=1}|v,h_{-j})=\sigma(\sum_{i=1}^PW_{ij}v_i) p(hv)=j=13p(hjv)p(hj=1v)=p(hj=1v,hj)=σ(i=1PWijvi)
在这里插入图片描述

四、条件概率推导

p ( v i = 1 ∣ h , v − i ) = σ ( ∑ j = 1 P w i j h j + ∑ k = 1 / i D L i k v k ) p ( h i = 1 ∣ v , h − i ) = σ ( ∑ j = 1 D w i j v j + ∑ m = 1 / i P J i m h m ) p(v_i=1|h,v_{-i})=\sigma(\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^DL_{ik}v_k)\\p(h_i=1|v,h_{-i})=\sigma(\sum_{j=1}^Dw_{ij}v_j+\sum_{m=1/i}^PJ_{im}h_m) p(vi=1h,vi)=σ(j=1Pwijhj+k=1/iDLikvk)p(hi=1v,hi)=σ(j=1Dwijvj+m=1/iPJimhm)

p ( v i ∣ h , v − i ) = p ( v , h ) p ( h , v − i ) = 1 Z exp ⁡ { − E ( v , h ) } ∑ v i 1 Z exp ⁡ { − E ( v , h ) } = exp ⁡ { v T W h + 1 2 v T L v + 1 2 h T J h } ∑ v i exp ⁡ { v T W h + 1 2 v T L v + 1 2 h T J h } = exp ⁡ { v T W h + 1 2 v T L v } ∑ v i exp ⁡ { v T W h + 1 2 v T L v } = exp ⁡ { v T W h + 1 2 v T L v } exp ⁡ { v T W h + 1 2 v T L v } ∣ v i = 0 + exp ⁡ { v T W h + 1 2 v T L v } ∣ v i = 1 p(v_i|h,v_{-i})=\frac{p(v,h)}{p(h,v_{-i})}\\=\frac{\frac1Z\exp\{-E(v,h)\}}{\sum_{v_i}\frac1Z\exp\{-E(v,h)\}}\\=\frac{\exp\{v^TWh+\frac12v^TLv+\frac12h^TJh\}}{\sum_{v_i}\exp\{v^TWh+\frac12v^TLv+\frac12h^TJh\}}\\=\frac{\exp\{v^TWh+\frac12v^TLv\}}{\sum_{v_i}\exp\{v^TWh+\frac12v^TLv\}}\\=\frac{\exp\{v^TWh+\frac12v^TLv\}}{\exp\{v^TWh+\frac12v^TLv\}\Bigg|_{v_i=0}+\exp\{v^TWh+\frac12v^TLv\}\Bigg|_{v_i=1}} p(vih,vi)=p(h,vi)p(v,h)=viZ1exp{ E(v,h)}Z1exp{ E(v,h)}=viexp{ vTWh+21vTLv+21hTJh}exp{ vTWh+21vTLv+21hTJh}=viexp{ vTWh+21vTLv}exp{ vTWh+21vTLv}=exp{ vTWh+21vTLv}vi=0+exp{ vTWh+21vTLv}vi=1exp{ vTWh+21vTLv}

所以,

p ( v i = 1 ∣ h , v − i ) = exp ⁡ { v T W h + 1 2 v T L v } ∣ v i = 1 exp ⁡ { v T W h + 1 2 v T L v } ∣ v i = 0 + exp ⁡ { v T W h + 1 2 v T L v } ∣ v i = 1 p(v_i=1|h,v_{-i})=\frac{\exp\{v^TWh+\frac12v^TLv\}\Bigg|_{v_i=1}}{\exp\{v^TWh+\frac12v^TLv\}\Bigg|_{v_i=0}+\exp\{v^TWh+\frac12v^TLv\}\Bigg|_{v_i=1}} p(vi=1h,vi)=exp{ vTWh+21vTLv}vi=0+exp{ vTWh+21vTLv}vi=1exp{ vTWh+21vTLv}vi=1
Δ = exp ⁡ { v T W h + 1 2 v T L v } \Delta=\exp\{v^TWh+\frac12v^TLv\} Δ=exp{ vTWh+21vTLv}
所以, p ( v i = 1 ∣ h , v − i ) = Δ v i = 1 Δ v i = 0 + Δ v i = 1 p(v_i=1|h,v_{-i})=\frac{\Delta_{v_i=1}}{\Delta_{v_i=0}+\Delta_{v_i=1}} p(vi=1h,vi)=Δvi=0+Δvi=1Δvi=1

Δ v i = exp ⁡ { v T W h + 1 2 v T L v } = exp ⁡ { ∑ i ^ = 1 D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 D ∑ k = 1 D v i ^ l i ^ k v k } = exp ⁡ { ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + ∑ j = 1 P v i w i j h j + 1 2 ( ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k + ∑ i ^ = 1 / i D v i ^ l i ^ i v i + ∑ k = 1 / i D v i l i k v k ) } = exp ⁡ { ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + ∑ j = 1 P v i w i j h j + 1 2 ( ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k + 2 ∑ k = 1 / i D v i l i k v k ) } = exp ⁡ { v i ( ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k ) + ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k } \Delta_{v_i}=\exp\{v^TWh+\frac12v^TLv\}\\=\exp\{\sum_{\hat i=1}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1}^D\sum_{k=1}^Dv_{\hat i}l_{\hat ik}v_k\}\\=\exp\{\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\sum_{j=1}^Pv_{i}w_{ij}h_j+\frac12\Big(\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k+\sum_{\hat i=1/i}^Dv_{\hat i}l_{\hat ii}v_i+\sum_{k=1/i}^Dv_{i}l_{ik}v_k\Big)\}\\=\exp\{\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\sum_{j=1}^Pv_{i}w_{ij}h_j+\frac12\Big(\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k+2\sum_{k=1/i}^Dv_{i}l_{ik}v_k\Big)\}\\=\exp\{v_{i}\Big(\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k\Big)+\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k\} Δvi=exp{ vTWh+21vTLv}=exp{ i^=1Dj=1Pvi^wi^jhj+21i^=1Dk=1Dvi^li^kvk}=exp{ i^=1/iDj=1Pvi^wi^jhj+j=1Pviwijhj+21(i^=1/iDk=1/iDvi^li^kvk+i^=1/iDvi^li^ivi+k=1/iDvilikvk)}=exp{ i^=1/iDj=1Pvi^wi^jhj+j=1Pviwijhj+21(i^=1/iDk=1/iDvi^li^kvk+2k=1/iDvilikvk)}=exp{ vi(j=1Pwijhj+k=1/iDlikvk)+i^=1/iDj=1Pvi^wi^jhj+21i^=1/iDk=1/iDvi^li^kvk}
不难看出只有第一项与 v i v_i vi有关,所以,
Δ v i = 0 = exp ⁡ { ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k } \Delta_{v_i=0}=\exp\{\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k\} Δvi=0=exp{ i^=1/iDj=1Pvi^wi^jhj+21i^=1/iDk=1/iDvi^li^kvk}
Δ v i = 1 = exp ⁡ { ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k + ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k } \Delta_{v_i=1}=\exp\{\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k+\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k\} Δvi=1=exp{ j=1Pwijhj+k=1/iDlikvk+i^=1/iDj=1Pvi^wi^jhj+21i^=1/iDk=1/iDvi^li^kvk}
所以,
p ( v i = 1 ∣ h , v − i ) = Δ v i = 1 Δ v i = 0 + Δ v i = 1 = exp ⁡ { ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k + ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k } exp ⁡ { ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k } + exp ⁡ { ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k + ∑ i ^ = 1 / i D ∑ j = 1 P v i ^ w i ^ j h j + 1 2 ∑ i ^ = 1 / i D ∑ k = 1 / i D v i ^ l i ^ k v k } = exp ⁡ { ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k } 1 + exp ⁡ { ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k } = 1 1 + exp ⁡ { ∑ j = 1 P w i j h j + ∑ k = 1 / i D l i k v k } − 1 = σ ( ∑ j = 1 P w i j h j + ∑ k = 1 / i D L i k v k ) p(v_i=1|h,v_{-i})=\frac{\Delta_{v_i=1}}{\Delta_{v_i=0}+\Delta_{v_i=1}}\\=\frac{\exp\{\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k+\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k\}}{\exp\{\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k\}+\exp\{\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k+\sum_{\hat i=1/i}^D\sum_{j=1}^Pv_{\hat i}w_{\hat ij}h_j+\frac12\sum_{\hat i=1/i}^D\sum_{k=1/i}^Dv_{\hat i}l_{\hat ik}v_k\}}\\=\frac{\exp\{\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k\}}{1+\exp\{\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k\}}\\=\frac1{1+\exp\{\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^Dl_{ik}v_k\}^{-1}}\\=\sigma(\sum_{j=1}^Pw_{ij}h_j+\sum_{k=1/i}^DL_{ik}v_k) p(vi=1h,vi)=Δvi=0+Δvi=1Δvi=1=exp{ i^=1/iDj=1Pvi^wi^jhj+21i^=1/iDk=1/iDvi^li^kvk}+exp{ j=1Pwijhj+k=1/iDlikvk+i^=1/iDj=1Pvi^wi^jhj+21i^=1/iDk=1/iDvi^li^kvk}exp{ j=1Pwijhj+k=1/iDlikvk+i^=1/iDj=1Pvi^wi^jhj+21i^=1/iDk=1/iDvi^li^kvk}=1+exp{ j=1Pwijhj+k=1/iDlikvk}exp{ j=1Pwijhj+k=1/iDlikvk}=1+exp{ j=1Pwijhj+k=1/iDlikvk}11=σ(j=1Pwijhj+k=1/iDLikvk)

同理可得,

p ( h i = 1 ∣ v , h − i ) = σ ( ∑ j = 1 D w i j v j + ∑ m = 1 / i P J i m h m ) p(h_i=1|v,h_{-i})=\sigma(\sum_{j=1}^Dw_{ij}v_j+\sum_{m=1/i}^PJ_{im}h_m) p(hi=1v,hi)=σ(j=1Dwijvj+m=1/iPJimhm)

五、基于平均场理论的变分推断

L = E l B O = log ⁡ p θ ( v ) − K L ( q ϕ ∣ ∣ p θ ) = ∑ h q ϕ ( h ∣ v ) log ⁡ p θ ( v , h ) + H [ q ] L=ElBO=\log p_\theta(v)-KL(q_\phi||p_\theta)=\sum_hq_\phi(h|v)\log p_\theta(v,h)+H[q] L=ElBO=logpθ(v)KL(qϕpθ)=hqϕ(hv)logpθ(v,h)+H[q]
q ϕ ( h ∣ v ) = ∏ j = 1 P q ϕ ( h j ∣ v )                      q ϕ ( h j = 1 ∣ v ) = ϕ j ,                ϕ = { ϕ j } j = 1 P q_\phi(h|v)=\prod_{j=1}^Pq_\phi(h_j|v)\;\;\;\;\;\;\;\;\;\;q_\phi(h_j=1|v)=\phi_j,\;\;\;\;\;\;\;\phi=\{\phi_j\}_{j=1}^P qϕ(hv)=j=1Pqϕ(hjv)qϕ(hj=1v)=ϕj,ϕ={ ϕj}j=1P
ϕ ^ j = arg max ⁡ ϕ j L = arg max ⁡ ϕ j ∑ h q ϕ ( h ∣ v ) [ − log ⁡ Z + v T W h + 1 2 v T L v + 1 2 h T J h ] + H [ q ] = arg max ⁡ ϕ j ∑ h q ϕ ( h ∣ v ) [ − log ⁡ Z + 1 2 v T L v ] + ∑ h q ϕ ( h ∣ v ) [ v T W h + 1 2 h T J h ] + H [ q ] = arg max ⁡ ϕ j ∑ h q ϕ ( h ∣ v ) [ v T W h + 1 2 h T J h ] + H [ q ] = arg max ⁡ ϕ j ∑ h q ϕ ( h ∣ v ) ⋅ v T W h ⏟ ① + 1 2 ∑ h q ϕ ( h ∣ v ) ⋅ h T J h ⏟ ② + H [ q ] ⏟ ③ \hat\phi_j=\argmax_{\phi_j} L\\=\argmax_{\phi_j} \sum_hq_\phi(h|v)\Big[-\log Z+v^TWh+\frac12v^TLv+\frac12h^TJh\Big]+H[q]\\=\argmax_{\phi_j} \sum_hq_\phi(h|v)\Big[-\log Z+\frac12v^TLv\Big]+\sum_hq_\phi(h|v)\Big[v^TWh+\frac12h^TJh\Big]+H[q]\\=\argmax_{\phi_j} \sum_hq_\phi(h|v)\Big[v^TWh+\frac12h^TJh\Big]+H[q]\\=\argmax_{\phi_j} \underset{①}{\underbrace{\sum_hq_\phi(h|v)\cdot v^TWh}}+\underset{②}{\underbrace{\frac12\sum_hq_\phi(h|v)\cdot h^TJh}}+\underset{③}{\underbrace{H[q]}} ϕ^j=ϕjargmaxL=ϕjargmaxhqϕ(hv)[logZ+vTWh+21vTLv+21hTJh]+H[q]=ϕjargmaxhqϕ(hv)[logZ+21vTLv]+hqϕ(hv)[vTWh+21hTJh]+H[q]=ϕjargmaxhqϕ(hv)[vTWh+21hTJh]+H[q]=ϕjargmax hqϕ(hv)vTWh+ 21hqϕ(hv)hTJh+ H[q]

① = ∑ h q ϕ ( h ∣ v ) ⋅ ∑ i = 1 D ∑ j = 1 P v i w i j h j = ∑ h ∏ j ^ = 1 P q ϕ ( h j ^ ∣ v ) ⋅ ∑ i = 1 D ∑ j = 1 P v i w i j h j ①=\sum_hq_\phi(h|v)\cdot \sum_{i=1}^D\sum_{j=1}^Pv_iw_{ij}h_j\\=\sum_h\prod_{\hat j=1}^Pq_\phi(h_{\hat j}|v)\cdot \sum_{i=1}^D\sum_{j=1}^Pv_iw_{ij}h_j =hqϕ(hv)i=1Dj=1Pviwijhj=hj^=1Pqϕ(hj^v)i=1Dj=1Pviwijhj
因为, ∑ h ∏ j ^ = 1 P q ϕ ( h j ^ ∣ v ) ⋅ v 1 w 12 h 2 = ∑ h 2 q ϕ ( h 2 ∣ v ) ⋅ v 1 w 12 h 2 ⋅ ∑ h / h 2 ∏ j ^ = 1 / 2 P q ϕ ( h j ^ ∣ v ) = ∑ h 2 q ϕ ( h 2 ∣ v ) ⋅ v 1 w 12 h 2 = q ϕ ( h 2 = 1 ∣ v ) ⋅ v 1 w 12 = ϕ 2 v 1 w 12 \sum_h\prod_{\hat j=1}^Pq_\phi(h_{\hat j}|v)\cdot v_1w_{12}h_2=\sum_{h_2}q_\phi(h_2|v)\cdot v_1w_{12}h_2\cdot\sum_{h/h_2}\prod_{\hat j=1/2}^Pq_\phi(h_{\hat j}|v)\\=\sum_{h_2}q_\phi(h_2|v)\cdot v_1w_{12}h_2\\=q_\phi(h_2=1|v)\cdot v_1w_{12}\\=\phi_2v_1w_{12} hj^=1Pqϕ(hj^v)v1w12h2=h2qϕ(h2v)v1w12h2h/h2j^=1/2Pqϕ(hj^v)=h2qϕ(h2v)v1w12h2=qϕ(h2=1v)v1w12=ϕ2v1w12
所以, ① = ∑ i = 1 D ∑ j ^ = 1 P ϕ j ^ v i w i j ^ ①=\sum_{i=1}^D\sum_{\hat j=1}^P\phi_{\hat j}v_iw_{i\hat j} =i=1Dj^=1Pϕj^viwij^
同理, ② = ∑ j ^ = 1 P ∑ m = 1 / j P ϕ j ^ ϕ m J j ^ m ②=\sum_{\hat j=1}^P\sum_{m=1/j}^P\phi_{\hat j}\phi_mJ_{\hat jm} =j^=1Pm=1/jPϕj^ϕmJj^m
③ = − ∑ j = 1 P [ ϕ j log ⁡ ϕ j + ( 1 − ϕ j ) log ⁡ ( 1 − ϕ j ) ] ③=-\sum_{j=1}^P\Big[\phi_j\log\phi_j+(1-\phi_j)\log(1-\phi_j)\Big] =j=1P[ϕjlogϕj+(1ϕj)log(1ϕj)]

分别对①、②、③求偏导,

∂ ① ∂ ϕ j = ∑ i = 1 P v i w i j \frac{\partial①}{\partial\phi_j}=\sum_{i=1}^Pv_iw_{ij} ϕj=i=1Pviwij
∂ ② ∂ ϕ j = ∑ m = 1 / j P ϕ m J j m \frac{\partial②}{\partial\phi_j}=\sum_{m=1/j}^P\phi_mJ_{jm} ϕj=m=1/jPϕmJjm
∂ ③ ∂ ϕ j = − log ⁡ ϕ j 1 − ϕ j \frac{\partial③}{\partial\phi_j}=-\log\frac{\phi_j}{1-\phi_j} ϕj=log1ϕjϕj

所以,

令, ∂ [ ① + ② + ③ ] ∂ ϕ j = 0 \frac{\partial\Big[①+②+③\Big]}{\partial\phi_j}=0 ϕj[++]=0
得, ϕ j = σ ( ∑ i = 1 D v i w i j + ∑ m = 1 / j P ϕ m J j m ) \phi_j=\sigma(\sum_{i=1}^Dv_iw_{ij}+\sum_{m=1/j}^P\phi_mJ_{jm}) ϕj=σ(i=1Dviwij+m=1/jPϕmJjm)
不动点方程,解法(坐标上升)
ϕ ^ = { ϕ ^ j } j = 1 P \hat\phi=\{\hat\phi_j\}^P_{j=1} ϕ^={ ϕ^j}j=1P

RBM:白板推导系列笔记(二十一)-受限玻尔兹曼机

下一章传送门:白板推导系列笔记(二十九)-深度玻尔兹曼机

猜你喜欢

转载自blog.csdn.net/qq_41485273/article/details/112337238