机器学习-白板推导系列笔记(二十七)-DBN

此文章主要是结合哔站shuhuai008大佬的白板推导视频:深度信念网络_72min

全部笔记的汇总贴:机器学习-白板推导系列笔记

对应花书20.3

一、介绍

DBN(深度信念网络 )的引入开始了当前深度学习的复兴。实质上是一个混合模型
在这里插入图片描述

P ( v , h ( 1 ) , h ( 2 ) , h ( 3 ) ) = P ( v ∣ h ( 1 ) , h ( 2 ) , h ( 3 ) ) ⋅ P ( h ( 1 ) , h ( 2 ) , h ( 3 ) ) = P ( v ∣ h ( 1 ) ) ⋅ P ( h ( 1 ) , h ( 2 ) , h ( 3 ) ) = P ( v ∣ h ( 1 ) ) ⋅ P ( h ( 1 ) ∣ h ( 2 ) ) ⋅ P ( h ( 2 ) , h ( 3 ) ) = ∏ i P ( v i ∣ h ( 1 ) ) ⋅ ∏ j P ( h j ( 1 ) ∣ h ( 2 ) ) ⋅ P ( h ( 2 ) , h ( 3 ) ) P(v,h^{(1)},h^{(2)},h^{(3)})=P(v|h^{(1)},h^{(2)},h^{(3)})\cdot P(h^{(1)},h^{(2)},h^{(3)})\\=P(v|h^{(1)})\cdot P(h^{(1)},h^{(2)},h^{(3)})\\=P(v|h^{(1)})\cdot P(h^{(1)}|h^{(2)})\cdot P(h^{(2)},h^{(3)})\\=\prod_iP(v_i|h^{(1)})\cdot\prod_jP(h_j^{(1)}|h^{(2)})\cdot P(h^{(2)},h^{(3)}) P(v,h(1),h(2),h(3))=P(vh(1),h(2),h(3))P(h(1),h(2),h(3))=P(vh(1))P(h(1),h(2),h(3))=P(vh(1))P(h(1)h(2))P(h(2),h(3))=iP(vih(1))jP(hj(1)h(2))P(h(2),h(3))

θ = { w ( 1 ) , w ( 2 ) , w ( 3 ) , b ( 0 ) , b ( 1 ) , b ( 2 ) , b ( 3 ) } w : , i : i    t h    c o l u m n    n e c t o r    o f    w \theta=\{w^{(1)},w^{(2)},w^{(3)},b^{(0)},b^{(1)},b^{(2)},b^{(3)}\}\\w_{:,i}:i \;th\;column\;nector\;of\;w θ={ w(1),w(2),w(3),b(0),b(1),b(2),b(3)}w:,i:ithcolumnnectorofw

P ( v i ∣ h ( 1 ) ) = s i g m o i d ( w : , i ( 1 ) T ⋅ h ( 1 ) + b i ( 0 ) ) P(v_i|h^{(1)})=sigmoid({w_{:,i}^{(1)}}^T\cdot h^{(1)}+b_i^{(0)}) P(vih(1))=sigmoid(w:,i(1)Th(1)+bi(0))
   \;
P ( h j ( 1 ) ∣ h ( 2 ) ) = s i g m o i d ( w : , i ( 2 ) T ⋅ h ( 2 ) + b j ( 1 ) ) P(h_j^{(1)}|h^{(2)})=sigmoid({w_{:,i}^{(2)}}^T\cdot h^{(2)}+b_j^{(1)}) P(hj(1)h(2))=sigmoid(w:,i(2)Th(2)+bj(1))
   \;
P ( h ( 2 ) , h ( 3 ) ) = 1 Z exp ⁡ { h ( 3 ) T ⋅ w ( 3 ) ⋅ h ( 2 ) + h ( 2 ) T ⋅ b ( 2 ) + h ( 3 ) T ⋅ b ( 3 ) } P(h^{(2)},h^{(3)})=\frac1Z\exp\{ {h^{(3)}}^T\cdot w^{(3)}\cdot h^{(2)}+{h^{(2)}}^T\cdot b^{(2)}+{h^{(3)}}^T\cdot b^{(3)}\} P(h(2),h(3))=Z1exp{ h(3)Tw(3)h(2)+h(2)Tb(2)+h(3)Tb(3)}

二、Stacking RBM

在这里插入图片描述

P ( v ) = ∑ h ( 1 ) P ( v , h ( 1 ) ) = ∑ h ( 1 ) P ( h ( 1 ) ) ⏟ p r i o r P ( v ∣ h ( 1 ) ) ⏟ f i x e d P(v)=\sum_{h^{(1)}}P(v,h^{(1)})\\=\sum_{h^{(1)}}\underset{prior}{\underbrace{P(h^{(1)})}}\underset{fixed}{\underbrace{P(v|h^{(1)})}} P(v)=h(1)P(v,h(1))=h(1)prior P(h(1))fixed P(vh(1))
P ( h ( 1 ) ) = ∑ h ( 2 ) P ( h ( 1 ) , h ( 2 ) ) P(h^{(1)})=\sum_{h^{(2)}}P(h^{(1)},h^{(2)}) P(h(1))=h(2)P(h(1),h(2))

我们通过ELBO来看这两个模型,我们先看上图中下面的模型,

log ⁡ p ( v ) = log ⁡ ∑ h ( 1 ) p ( v , h ( 1 ) ) = log ⁡ ∑ h ( 1 ) q ( h ( 1 ) ∣ v ) p ( v , h ( 1 ) ) q ( h ( 1 ) ∣ v ) = log ⁡ E q ( h ( 1 ) ∣ v ) [ p ( v , h ( 1 ) ) q ( h ( 1 ) ∣ v ) ] ≥ E q ( h ( 1 ) ∣ v ) [ log ⁡ p ( v , h ( 1 ) ) q ( h ( 1 ) ∣ v ) ] = ∑ h ( 1 ) q ( h ( 1 ) ∣ v ) [ log ⁡ p ( v , h ( 1 ) ) − log ⁡ q ( h ( 1 ) ∣ v ) ] = ∑ h ( 1 ) q ( h ( 1 ) ∣ v ) [ log ⁡ p ( h ( 1 ) ) + log ⁡ p ( v ∣ h ( 1 ) ) − log ⁡ q ( h ( 1 ) ∣ v ) ] \log p(v)=\log\sum_{h^{(1)}}p(v,h^{(1)})\\=\log\sum_{h^{(1)}}q(h^{(1)}|v)\frac{p(v,h^{(1)})}{q(h^{(1)}|v)}\\=\log E_{q(h^{(1)}|v)}\Big[\frac{p(v,h^{(1)})}{q(h^{(1)}|v)}\Big]\\\geq E_{q(h^{(1)}|v)}\Big[\log\frac{p(v,h^{(1)})}{q(h^{(1)}|v)}\Big]\\=\sum_{h^{(1)}}q(h^{(1)}|v)\Big[\log p(v,h^{(1)})-\log q(h^{(1)}|v)\Big]\\=\sum_{h^{(1)}}q(h^{(1)}|v)\Big[{\color{blue}\log p(h^{(1)})}+\log p(v|h^{(1)})-\log q(h^{(1)}|v)\Big] logp(v)=logh(1)p(v,h(1))=logh(1)q(h(1)v)q(h(1)v)p(v,h(1))=logEq(h(1)v)[q(h(1)v)p(v,h(1))]Eq(h(1)v)[logq(h(1)v)p(v,h(1))]=h(1)q(h(1)v)[logp(v,h(1))logq(h(1)v)]=h(1)q(h(1)v)[logp(h(1))+logp(vh(1))logq(h(1)v)]

2nd layer RBM Learning等价于maximum log-likelihood over p ( h ( 1 ) ) p(h^{(1)}) p(h(1))

我们增加一层就是为了让 p ( h ( 1 ) ) p(h^{(1)}) p(h(1))的似然达到最大,也就是improve了上式中的蓝色部分,也就提高了ELBO。

三、贪心预训练

log ⁡ p ( v ) ≥ E L B O = ∑ h ( 1 ) q ( h ( 1 ) ∣ v ) ⋅ log ⁡ p ( v , h ( 1 ) ) − ∑ h ( 1 ) q ( h ( 1 ) ∣ v ) log ⁡ q ( h ( 1 ) ∣ v ) \log p(v)\geq ELBO=\sum_{h^{(1)}}q(h^{(1)}|v)\cdot\log p(v,h^{(1)})-\sum_{h^{(1)}}q(h^{(1)}|v)\log q(h^{(1)}|v) logp(v)ELBO=h(1)q(h(1)v)logp(v,h(1))h(1)q(h(1)v)logq(h(1)v)

q ( h ( 1 ) ∣ v ) = ∏ i q ( h i ( 1 ) ∣ v ) = ∏ i s i g m o i d ( w i , : ( 1 ) ⋅ v + b i ( 1 ) ) q(h^{(1)}|v)=\prod_i q(h_i^{(1)}|v)=\prod_isigmoid(w_{i,:}^{(1)}\cdot v+b_i^{(1)}) q(h(1)v)=iq(hi(1)v)=isigmoid(wi,:(1)v+bi(1))

在这里插入图片描述
DBN ⟶ \longrightarrow ELBO is relatively loose.

下一章传送门:白板推导系列笔记(二十八)-玻尔兹曼机

猜你喜欢

转载自blog.csdn.net/qq_41485273/article/details/112311391