EM算法推导--三硬币模型推导过程

本篇博客主要介绍李航《统计学习方法(第2版)》中讲解EM算法涉及到的三硬币模型案例,原文中该模型的推导过程被省略了。本篇博客主要是将该模型的具体推导过程。

1 三硬币模型

  假设有3枚硬币,分别记作A,B,C。这些硬币正面出现的概率分别是 π \pi π p p p q q q。进行如下掷硬币试验:先掷硬币A,根据其结果选出硬币B或硬币C,正面选硬币B,反面选硬币C;然后掷选出的硬币,掷硬币的结果,出现正面记作1,出现反面记作0;独立地重复 n n n次试验(这里 n = 10 n=10 n=10),观测结果如下: 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,1 1,1,0,1,0,0,1,0,1,1目前只能观测到掷硬币的结果,不能观测掷硬币的过程。EM算法要解决的问题就是在缺少掷硬币的过程信息的情况下如何估计 π \pi π p p p q q q的值。

2 推导过程

  假设观测数据记为 Y = ( y 1 , y 2 , … , y n ) T Y=(y_{1},y_{2},\dots,y_{n})^{T} Y=(y1,y2,,yn)T,其中 y i y_{i} yi表示第 i i i次试验的观测结果是1或0;隐变量数据记为 Z = ( z 1 , z 2 , … , z n ) T Z=(z_{1},z_{2},\dots,z_{n})^{T} Z=(z1,z2,,zn)T,其中 z i z_{i} zi表示第 i i i次未观测到的掷硬币A的结果。 θ = ( π , p , q ) \theta=(\pi,p,q) θ=(π,p,q)记为模型参数。对于任意一次试验中的观测变量 y i y_{i} yi,其概率为 P ( y i ∣ θ ) = ∑ z i P ( y i , z i ∣ θ ) = ∑ z i p ( z i ∣ θ ) P ( y i ∣ z i , θ ) = π p y i ( 1 − p ) 1 − y i + ( 1 − π ) q y i ( 1 − q ) 1 − y i \begin{aligned}P(y_{i}|\theta)&=\sum_{z_{i}}P(y_{i},z_{i}|\theta) =\sum_{z_{i}}p(z_{i}|\theta)P(y_{i}|z_{i},\theta)\\&=\pi p^{y_{i}}(1-p)^{1-y_{i}}+(1-\pi)q^{y_{i}}(1-q)^{1-y_{i}}\end{aligned} P(yiθ)=ziP(yi,ziθ)=zip(ziθ)P(yizi,θ)=πpyi(1p)1yi+(1π)qyi(1q)1yi所以,观测数据 Y Y Y的似然函数为 P ( Y ∣ θ ) = ∑ Z P ( Z ∣ θ ) P ( Y ∣ Z , θ ) = ∏ i = 1 n [ π p y i ( 1 − p i ) 1 − y i + ( 1 − π ) q y i ( 1 − q ) 1 − y i ] \begin{aligned}P(Y|\theta)&=\sum_{Z}P(Z|\theta)P(Y|Z,\theta)\\&=\prod_{i=1}^{n}[\pi p^{y_{i}}(1-p_{i})^{1-y_{i}}+(1-\pi)q^{y_{i}}(1-q)^{1-y_{i}}]\end{aligned} P(Yθ)=ZP(Zθ)P(YZ,θ)=i=1n[πpyi(1pi)1yi+(1π)qyi(1q)1yi]若使用传统的极大似然方法求解参数 θ \theta θ,其解为 θ ^ = a r g   m a x θ l o g P ( Y ∣ θ ) \hat \theta=arg\space \underset {\theta}{max}log P(Y|\theta) θ^=arg θmaxlogP(Yθ)但该公式并不存在解析解,该方法行不通。这时就需要用到EM算法了。EM算法是一个不断迭代的过程,直到满足终止条件则结束。以下过程主要展示如何利用第 i i i次的参数结果推导出第 i + 1 i+1 i+1次的参数。

  • 首先给模型参数指定初始值。这里记初值为 θ ( 0 ) = ( π ( 0 ) , p ( 0 ) , q ( 0 ) ) \theta^{(0)}=(\pi^{(0)},p^{(0)},q^{(0)}) θ(0)=(π(0),p(0),q(0))
  • 假设我们现在已经得到第 i i i次迭代后的参数值,记为 θ ( i ) = ( π ( i ) , p ( i ) , q ( i ) ) \theta^{(i)}=(\pi^{(i)},p^{(i)},q^{(i)}) θ(i)=(π(i),p(i),q(i))。现在我们要做的是: θ ( i ) → θ ( i + 1 ) \theta^{(i)}\rightarrow \theta^{(i+1)} θ(i)θ(i+1)
  • 根据 θ ( i ) \theta^{(i)} θ(i)估计隐藏数据的值并计算 Q Q Q函数(E步);
    这里要估计的是隐藏数据 Z Z Z出现的概率,也就是每次试验中选中硬币B或硬币C的概率。在得到 θ ( i ) \theta^{(i)} θ(i)后,观测数据 y j y_{j} yj来自硬币 B B B的概率如下: u j ( i + 1 ) = P ( z j = B ∣ y j , θ ( i ) ) = π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j + ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j u_{j}^{(i+1)}=P(z_{j}=B|y_{j},\theta^{(i)})=\frac{\pi^{(i)}(p^{(i)})^{y_{j}}(1-p^{(i)})^{1-y_{j}}} {\pi^{(i)} (p^{(i)})^{y_{j}} (1-p^{(i)})^{1-y_{j}}+ (1-\pi^{(i)})(q^{(i)})^{y_{j}}(1-q^{(i)})^{1-y_{j}}} uj(i+1)=P(zj=Byj,θ(i))=π(i)(p(i))yj(1p(i))1yj+(1π(i))(q(i))yj(1q(i))1yjπ(i)(p(i))yj(1p(i))1yj观测数据 y j y_{j} yj来自硬币C的概率为: 1 − u j ( i + 1 ) 1-u_{j}^{(i+1)} 1uj(i+1). 计算 Q Q Q函数(这里不讲 Q Q Q函数的推导,书上讲的很详细),如下: Q ( θ , θ ( i ) ) = E Z [ l o g P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) l o g P ( Y , Z ∣ θ ) \begin{aligned}Q(\theta,\theta^{(i)})&=E_{Z}[log P(Y,Z|\theta)|Y,\theta^{(i)}]\\&=\sum_{Z}P(Z|Y,\theta^{(i)})log P(Y,Z|\theta)\end{aligned} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZP(ZY,θ(i))logP(Y,Zθ)这里要介绍一下 Q Q Q函数代表的意义:完全数据的对数似然函数 l o g P ( Y , Z ∣ θ ) log P(Y,Z|\theta) logP(Y,Zθ)在给定观测数据 Y Y Y和当前参数 θ ( i ) \theta^{(i)} θ(i)下对未观测数据 Z Z Z的条件概率分布 P ( Z ∣ Y , θ ( i ) ) P(Z|Y,\theta^{(i)}) P(ZY,θ(i))的期望。而所求的 θ ( i + 1 ) = a r g   m a x θ Q ( θ , θ ( i ) ) \theta^{(i+1)}=arg \space \underset {\theta}{max}Q(\theta,\theta^{(i)}) θ(i+1)=arg θmaxQ(θ,θ(i))
    这里补充一点关于条件概率期望的知识。在 X = x X=x X=x下随机变量 Y Y Y的期望的计算公式如下: E ( Y ∣ X = x ) = { ∑ y j P ( Y = y j ∣ X = x ) (X,Y)为二维离散随机变量 ∫ − ∞ ∞ y p ( y ∣ x ) d y (X,Y)为二维连续随机变量 E(Y|X=x)=\begin{cases}\sum y_{j}P(Y=y_{j}|X=x) & \text {(X,Y)为二维离散随机变量} \\ \int_{-\infty}^{\infty}yp(y|x)dy & \text{(X,Y)为二维连续随机变量}\end{cases} E(YX=x)={ yjP(Y=yjX=x)yp(yx)dy(X,Y)为二维离散随机变量(X,Y)为二维连续随机变量接下来,为了能够完整地展现 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))的计算过程,我们先将这个三硬币模型的试验次数调整为2。
    假设现在在2次随机试验过程下得到的观测数据为 Y = ( y 1 , y 2 ) Y=(y_{1},y_{2}) Y=(y1,y2)。那么 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))所代表的就是每一种可能的观测数据集合 Z Z Z出现的条件概率与完全数据的似然函数 l o g P ( Y , Z ∣ θ ) logP(Y,Z|\theta) logP(Y,Zθ)的乘积之和。
    现在取 Z = ( 1 , 0 ) Z=(1,0) Z=(1,0),那么此时 P ( Z ∣ Y , θ ( i ) ) = u 1 ( i + 1 ) ( 1 − u 2 ( i + 1 ) ) P(Z|Y,\theta^{(i)})=u_{1}^{(i+1)}(1-u_{2}^{(i+1)}) P(ZY,θ(i))=u1(i+1)(1u2(i+1)),而 l o g P ( Y , Z ∣ θ ) = l o g [ π p y 1 ( 1 − p ) 1 − y 1 ( 1 − π ) q y 2 ( 1 − q ) 1 − y 2 ] = l o g π + y 1 l o g p + ( 1 − y 1 ) l o g ( 1 − p ) + l o g ( 1 − π ) + y 2 l o g q + ( 1 − y 2 ) l o g ( 1 − q ) \begin{aligned}logP(Y,Z|\theta)&=log[\pi p^{y_{1}}(1-p)^{1-y_{1}}(1-\pi)q^{y_{2}}(1-q)^{1-y_{2}}]\\&=log\pi+y_{1}logp+(1-y_{1})log(1-p)+log(1-\pi)+y_{2}logq+(1-y_{2})log(1-q)\end{aligned} logP(Y,Zθ)=log[πpy1(1p)1y1(1π)qy2(1q)1y2]=logπ+y1logp+(1y1)log(1p)+log(1π)+y2logq+(1y2)log(1q)补充上 Z = ( 1 , 1 ) Z=(1,1) Z=(1,1) Z = ( 0 , 1 ) Z=(0, 1) Z=(0,1) Z = ( 0 , 0 ) Z=(0,0) Z=(0,0)的情况下的 P ( Z ∣ Y , θ ( i ) ) P(Z|Y,\theta^{(i)}) P(ZY,θ(i)) l o g P ( Y , Z ∣ θ ) logP(Y,Z|\theta) logP(Y,Zθ)后即可得到对应的 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))。这里记为 Q 2 ( θ , θ ( i ) ) Q_{2}(\theta,\theta^{(i)}) Q2(θ,θ(i)),那么 Q 2 ( θ , θ ( i ) ) = u 1 i + 1 ( 1 − u 2 ( i + 1 ) ) ( l o g π + l o g ( 1 − π ) + y 1 l o g p + ( 1 − y 1 ) l o g ( 1 − p ) + y 2 l o g q + ( 1 − y 2 ) l o g ( 1 − q ) ) + u 1 ( i + 1 ) u 2 ( i + 1 ) ( l o g π + l o g π + y 1 l o g p + ( 1 − y 1 ) l o g ( 1 − p ) + y 2 l o g p + ( 1 − y 2 ) l o g ( 1 − p ) ) + ( 1 − u 1 ( i + 1 ) ) u 2 ( i + 1 ) ( l o g ( 1 − π ) + l o g π + y 1 l o g q + ( 1 − y 1 ) l o g ( 1 − q ) + y 2 l o g p + ( 1 − y 2 ) l o g ( 1 − p ) ) + ( 1 − u 1 ( i + 1 ) ) ( 1 − u 2 ( i + 1 ) ) ( l o g ( 1 − π ) + l o g ( 1 − π ) + y 1 l o g q + ( 1 − y 1 ) l o g ( 1 − q ) + y 2 l o g q + ( 1 − y 2 ) l o g ( 1 − q ) ) = ( u 1 ( i + 1 ) + u 2 ( i + 1 ) ) l o g π + ( 2 − u 1 ( i + 1 ) − u 2 ( i + 1 ) ) l o g ( 1 − π ) + u 1 ( i + 1 ) [ y 1 l o g p + ( 1 − y 1 ) l o g ( 1 − p ) ] + u 2 ( i + 1 ) [ y 2 l o g p + ( 1 − y 2 ) l o g ( 1 − p ) ] + ( 1 − u 1 ( i + 1 ) ) [ y 1 l o g q + ( 1 − y 1 ) l o g ( 1 − q ) + ( 1 − u 2 ( i + 1 ) ) [ y 2 l o g q + ( 1 − y 2 ) l o g ( 1 − q ) ] \begin{aligned}Q_{2}(\theta,\theta^{(i)})&=u_{1}^{i+1}(1-u_{2}^{(i+1)})(log\pi+log(1-\pi)+y_{1}logp+(1-y_{1})log(1-p)+y_{2}logq+(1-y_{2})log(1-q)) \\ &+u_{1}^{(i+1)}u_{2}^{(i+1)}(log\pi+log\pi+y_{1}logp+(1-y_{1})log(1-p)+y_{2}logp+(1-y_{2})log(1-p))\\&+(1-u_{1}^{(i+1)})u_{2}^{(i+1)}(log(1-\pi)+log\pi+y_{1}logq+(1-y_{1})log(1- q)+y_{2}logp+(1-y_{2})log(1-p))\\&+(1-u_{1}^{(i+1)})(1-u_{2}^{(i+1)})(log(1-\pi)+log(1-\pi)+y_{1}logq+(1-y_{1})log(1-q)+y_{2}logq+(1-y_{2})log(1-q)) \\&=(u_{1}^{(i+1)}+u_{2}^{(i+1)})log\pi+(2-u_{1}^{(i+1)}-u_{2}^{(i+1)})log(1-\pi) \\ &+u_{1}^{(i+1)}[y_{1}logp+(1-y_{1})log(1-p)]+u_{2}^{(i+1)}[y_{2}logp+(1-y_{2})log(1-p)]\\&+(1-u_{1}^{(i+1)})[y_{1}logq+(1-y_{1})log(1-q)+(1-u_{2}^{(i+1)})[y_{2}logq+(1-y_{2})log(1-q)]\end{aligned} Q2(θ,θ(i))=u1i+1(1u2(i+1))(logπ+log(1π)+y1logp+(1y1)log(1p)+y2logq+(1y2)log(1q))+u1(i+1)u2(i+1)(logπ+logπ+y1logp+(1y1)log(1p)+y2logp+(1y2)log(1p))+(1u1(i+1))u2(i+1)(log(1π)+logπ+y1logq+(1y1)log(1q)+y2logp+(1y2)log(1p))+(1u1(i+1))(1u2(i+1))(log(1π)+log(1π)+y1logq+(1y1)log(1q)+y2logq+(1y2)log(1q))=(u1(i+1)+u2(i+1))logπ+(2u1(i+1)u2(i+1))log(1π)+u1(i+1)[y1logp+(1y1)log(1p)]+u2(i+1)[y2logp+(1y2)log(1p)]+(1u1(i+1))[y1logq+(1y1)log(1q)+(1u2(i+1))[y2logq+(1y2)log(1q)]对上述的结果进行分析,可以推测出进行 n n n次数随机试验时,其 Q Q Q函数可以表示为: Q ( θ , θ ( i ) ) = ∑ j = 1 n [ u j ( i + 1 ) [ l o g π + y j l o g p + ( 1 − y j ) l o g ( 1 − p ) ] + ( 1 − u j ( i + 1 ) ) [ l o g ( 1 − π ) + y j l o g q + ( 1 − y j ) l o g ( 1 − q ) ] ] Q(\theta,\theta^{(i)})=\sum_{j=1}^{n}[u_{j}^{(i+1)}[log\pi+y_{j}logp+(1-y_{j})log(1-p)]+(1-u_{j}^{(i+1)})[log(1-\pi)+y_{j}logq+(1-y_{j})log(1-q)]] Q(θ,θ(i))=j=1n[uj(i+1)[logπ+yjlogp+(1yj)log(1p)]+(1uj(i+1))[log(1π)+yjlogq+(1yj)log(1q)]]
  • 接着,为了得到新一轮的参数 θ ( i + 1 ) \theta^{(i+1)} θ(i+1), 需要使用 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))函数依次对 π \pi π, p p p q q q求偏导。具体如下:
    π \pi π求偏导得到如下公式: ∂ Q ∂ π = ∑ j = 1 n [ u j ( i + 1 ) ⋅ 1 π − 1 − u j ( i + 1 ) 1 − π ] \frac{\partial{Q}}{\partial{\pi}}=\sum_{j=1}^{n}[u_{j}^{(i+1)} \cdot\frac{1}{\pi}-\frac{1-u_{j}^{(i+1)}}{1-\pi}] πQ=j=1n[uj(i+1)π11π1uj(i+1)]令该计算式为0,则得到: π ( i + 1 ) = 1 n ⋅ ∑ j = 1 n u j ( i + 1 ) \pi^{(i+1)}=\frac{1}{n}\cdot\sum_{j=1}^{n}u_{j}^{(i+1)} π(i+1)=n1j=1nuj(i+1) p p p求偏导得到如下公式: ∂ Q ∂ p = ∑ j = 1 n u j ( i + 1 ) ( y j p + y j − 1 1 − p ) \frac{\partial{Q}}{\partial{p}}=\sum_{j=1}^{n}u_{j}^{(i+1)}(\frac{y_{j}}{p}+\frac{y_{j}-1}{1-p}) pQ=j=1nuj(i+1)(pyj+1pyj1)令该计算式为0,得到: p ( i + 1 ) = ∑ j = 1 n u j ( i + 1 ) ⋅ y j ∑ j = 1 n u j ( i + 1 ) p^{(i+1)}=\frac{\sum_{j=1}^{n}u_{j}^{(i+1)}\cdot y_{j}}{\sum_{j=1}^{n}u_{j}^{(i+1)}} p(i+1)=j=1nuj(i+1)j=1nuj(i+1)yj q q q求偏导得到如下公式: ∂ Q ∂ q = ∑ j = 1 n ( 1 − u j ( i + 1 ) ) ( y j q + y j − 1 1 − q ) \frac{\partial{Q}}{\partial{q}}=\sum_{j=1}^{n}(1-u_{j}^{(i+1)})(\frac{y_{j}}{q}+\frac{y_{j}-1}{1-q}) qQ=j=1n(1uj(i+1))(qyj+1qyj1)同理,可以得到: q ( i + 1 ) = ∑ j = 1 n ( 1 − u j ( i + 1 ) ) y j ∑ j = 1 n ( 1 − u j ( i + 1 ) ) q^{(i+1)}=\frac{\sum_{j=1}^{n}(1-u_{j}^{(i+1)})y_{j}}{\sum_{j=1}^{n}(1-u_{j}^{(i+1)})} q(i+1)=j=1n(1uj(i+1))j=1n(1uj(i+1))yj

至此,三硬币模型的EM算法推导完毕。

参考资料

  1. https://blog.csdn.net/weixin_41566471/article/details/106219019
  2. 《统计学习方法》

猜你喜欢

转载自blog.csdn.net/yeshang_lady/article/details/132151771