EM算法Q函数推导过程详解

Q函数

Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z log ⁡ P ( Y , Z ∣ θ ) ⋅ P ( Z ∣ Y , θ ( i ) ) \begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\log P(Y, Z \mid \theta) \mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) \cdot P\left(Z \mid Y, \theta^{(i)}\right) \end{aligned} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZlogP(Y,Zθ)P(ZY,θ(i))

Q函数是EM算法中的一个重要函数,全称为“期望完全数据对数似然函数”。它的作用是在E步中计算出完全数据的对数似然函数的期望值,以便在M步中求出模型参数的最大似然估计值。

在之前的一篇文章(EM算法求解三硬币模型参数推导)中,为大家介绍了李航教授《统计学习方法》中求解三硬币模型的参数推导过程,其中使用的EM算法是从一个Q函数直接展开求解的,限于篇幅,文章并未展示证明过程,本篇文章作为上一篇文章以及《统计学习方法-第九章-179页》推导的补充,详细推导Q函数的由来。

Q函数推导证明

我们已知关于参数 θ \theta θ的似然函数
L ( θ ) = log ⁡ P ( Y ∣ θ ) = log ⁡ P ( Y , θ ) P ( θ ) = log ⁡ ∑ Z P ( Y , θ , Z ) P ( θ ) = log ⁡ ∑ Z P ( Y , θ , Z ) P ( θ ) = log ⁡ ∑ Z P ( Y , Z ∣ θ ) = log ⁡ ∑ Z P ( Y , Z , θ ) P ( Z , θ ) ⋅ P ( Z , θ ) P ( θ ) L(\theta)=\log P(Y \mid \theta) \\ =\log \frac{P(Y, \theta)}{ P(\theta)} \\=\log \frac{ \sum_Z P(Y, \theta,Z)}{ P(\theta)} \\=\log \sum_Z \frac{ P(Y, \theta,Z)}{ P(\theta)}=\log \sum_Z P(Y, Z \mid \theta) \\=\log \sum_Z \frac{ P(Y, Z, \theta)}{ P(Z,\theta) }\cdot \frac{ P(Z, \theta)}{ P(\theta) } L(θ)=logP(Yθ)=logP(θ)P(Y,θ)=logP(θ)ZP(Y,θ,Z)=logZP(θ)P(Y,θ,Z)=logZP(Y,Zθ)=logZP(Z,θ)P(Y,Z,θ)P(θ)P(Z,θ)

L ( θ ) = log ⁡ ∑ Z P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) L(\theta)=\log \sum_Z P(Y \mid Z, \theta) \cdot P(Z \mid \theta) L(θ)=logZP(YZ,θ)P(Zθ)
假设第i次参数取 θ ( i ) \theta^{(i)} θ(i) ,我们希望优化后 L ( θ ) > L ( θ ( i ) ) L(\theta)>L(\theta^{(i)}) L(θ)>L(θ(i))
于是可以作差


L ( θ ) − L ( θ ( i ) ) = log ⁡ Σ Z P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) − log ⁡ P ( Y ∣ θ ( i ) ) L(\theta)-L\left(\theta^{(i)}\right)=\log \Sigma_Z P(Y \mid Z, \theta) \cdot P(Z \mid \theta)-\log P\left(Y \mid \theta^{(i)}\right) L(θ)L(θ(i))=logΣZP(YZ,θ)P(Zθ)logP(Yθ(i))
第一项可以凑一个分式出来
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − log ⁡ P ( Y ∣ θ ( i ) ) L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}\right)-\log P\left(Y \mid \theta^{(i)}\right) L(θ)L(θ(i))=log(ΣZP(ZY,θ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))logP(Yθ(i))
利用 ∑ Z P ( Z ∣ Y , θ ( i ) ) = 1 \sum_Z P \left(Z \mid Y, \theta^{(i)}\right)=1 ZP(ZY,θ(i))=1的特性,第二项乘以这一串,可以得到
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − log ⁡ P ( Y ∣ θ ( i ) ) ⋅ Σ Z P ( Z ∣ Y , θ ( i ) ) L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}\right)-\log P\left(Y \mid \theta^{(i)}\right) \cdot \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) L(θ)L(θ(i))=log(ΣZP(ZY,θ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))logP(Yθ(i))ΣZP(ZY,θ(i))
利用 J e n s e n Jensen Jensen不等式
l o g ∑ j λ j ⋅ y j ⩾ ∑ j λ j ⋅ l o g y j log\sum_{j}\lambda_j \cdot y_j \geqslant \sum_j \lambda_j \cdot log y_j logjλjyjjλjlogyj,其中 λ ⩾ 0 , ∑ j λ j = 1 \lambda \geqslant 0,\sum_j \lambda_j =1 λ0,jλj=1

可知
⩾ ∑ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) − log ⁡ P ( Y ∣ θ ( i ) ) ⋅ Σ Z P ( Z ∣ Y , θ ( i ) ) \geqslant \sum_Z P \left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}-\log P\left(Y \mid \theta^{(i)}\right) \cdot \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) ZP(ZY,θ(i))logP(ZY,θ(i))P(YZ,θ)P(Zθ)logP(Yθ(i))ΣZP(ZY,θ(i))

= ∑ Z P ( Z ∣ Y , θ ( i ) ) ⋅ [ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) p ( Z ∣ Y , θ ( i ) ) − log ⁡ P ( Y ∣ θ ( i ) ) ] =\sum_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot\left[\log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{p\left(Z \mid Y, \theta^{(i)}\right)}-\log P\left(Y \mid \theta^{(i)}\right)\right] =ZP(ZY,θ(i))[logp(ZY,θ(i))P(YZ,θ)P(Zθ)logP(Yθ(i))]
= Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) =\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) } =ΣZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)
即此时 L ( θ ) − L ( θ ( i ) ) ⩾ Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) L(\theta)-L\left(\theta^{(i)}\right) \geqslant \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) } L(θ)L(θ(i))ΣZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)
L ( θ ) ⩾ L ( θ ( i ) ) + Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) L(\theta) \geqslant L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) } L(θ)L(θ(i))+ΣZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)
B ( θ , θ ( i ) ) = L ( θ ( i ) ) + Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) B\left(\theta, \theta^{(i)}\right)=L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) } B(θ,θ(i))=L(θ(i))+ΣZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)
此时 B ( θ , θ ( i ) ) B\left(\theta, \theta^{(i)}\right) B(θ,θ(i)) L ( θ ) L(\theta) L(θ) 的下界,使 B ( θ , θ ( i ) ) B\left(\theta, \theta^{(i)}\right) B(θ,θ(i)) 最大化的 θ \theta θ 也可使 L ( θ ) L\left( \theta\right) L(θ) 最大

于是我们的目标是 θ ( i + 1 ) = argmax ⁡ θ B ( θ , θ ( i ) ) \theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} B\left(\theta, \theta^{(i)}\right) θ(i+1)=θargmaxB(θ,θ(i))
也即
θ ( i + 1 ) = argmax ⁡ θ [ L ( θ ( i ) ) + Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) ] \theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} \left[L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{\left.P(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)}\right] θ(i+1)=θargmax[L(θ(i))+ΣZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)]

可把 L ( θ ( i ) ) 、 P ( Z ∣ Y , θ ( i ) ) 、 P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) L\left( \theta^{(i)}\right)、P\left(Z \mid Y, \theta^{(i)}\right)、 P\left(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right) L(θ(i))P(ZY,θ(i))P(ZY,θ(i))P(Yθ(i)) 三项视为常数
且已知 P ( Z ∣ Y , θ ( i ) ) ⋅ P ( Y ∣ θ ( i ) ) > 0 P\left(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)>0 P(ZY,θ(i))P(Yθ(i))>0 ,这一项从分母去掉,不影响求最大值,注意这里的 P ( Z ∣ Y , θ ( i ) ) \left.P(Z \mid Y, \theta^{(i)}\right) P(ZY,θ(i)) 不能省略,因为它是 ∑ \sum 后面中的每一项的系数

于是
θ ( i + 1 ) = argmax ⁡ θ [ Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) ] \theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}}\left[\Sigma_Z P\left(Z \mid Y,{ \theta}^{(i)}\right) \cdot \log P(Y \mid Z, \theta) \cdot P(Z \mid \theta)\right] θ(i+1)=θargmax[ΣZP(ZY,θ(i))logP(YZ,θ)P(Zθ)]

我们令 Q ( θ , θ ( i ) ) = Σ Z P ( Z ∣ Y , θ ( i ) ) ⋅ log ⁡ P ( Y ∣ Z , θ ) ⋅ P ( Z ∣ θ ) Q\left(\theta, \theta^{(i)}\right)=\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log P(Y \mid Z, \theta) \cdot P(Z \mid \theta) Q(θ,θ(i))=ΣZP(ZY,θ(i))logP(YZ,θ)P(Zθ)

θ ( i + 1 ) = argmax ⁡ θ Q ( θ , θ ( i + 1 ) ) \theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} Q\left(\theta, \theta^{(i+1)}\right) θ(i+1)=θargmaxQ(θ,θ(i+1))

其中 Q ( θ , θ ( i ) ) Q\left(\theta, \theta^{(i)}\right) Q(θ,θ(i)) 就是所谓的 Q Q Q 函数

参考资料

[1].EM算法求解三硬币模型参数推导

猜你喜欢

转载自blog.csdn.net/qq_33909788/article/details/135162605