EM算法摘记(一):三硬币问题

EM算法、隐变量 ( h i d d e n / l a t e n t   v a r i a b l e ) (hidden/latent \ variable)

E M \qquad EM 算法,实际上是含有 隐变量 ( h i d d e n   v a r i a b l e ) (hidden \ variable) [ [ 或说 潜在变量 ( l a t e n t   v a r i a b l e ) (latent \ variable) ] ] 概率模型参数的最大似然估计法,构造 E M EM 算法的关键之一在于选择合适的隐变量。

An elegant and powerful method for finding maximum likelihood solutions for models with latent variables is called the expectation-maximization algorithm. —— From 《Pattern Recognition and Machine Learning》 § 9.2.2

三硬币问题

三硬币问题 假设有3枚硬币 A B C A、B、C ,硬币正面出现的概率分别为 π p q \pi、p、q 。进行如下抛币试验:先掷硬币 A A ,根据其结果选择 硬币 B B (当 A A 正面)硬币 C C (当 A A 反面);然后投掷选出的硬币,掷硬币的结果,出现正面记为 1 1 ,出现反面记为 0 0 ;独立重复进行 N N 次试验。若 N N ( N = 10 ) (N=10) 试验的观测结果为: 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,1 (假设只能观测到掷硬币的结果无法观测到掷硬币的过程

李航. 《统计学习方法》第9章

在这里插入图片描述

1. 模型的建立

\qquad 考虑硬币正面(反面)出现的概率 P ( y θ ) P(y|\theta) ,其中 θ = ( π , p , q ) \theta=(\pi,p,q) 为模型参数。
\qquad
\qquad 其中, y y “三硬币问题”的观测变量 y = 1 y=1 表示掷出硬币的结果为正面 y = 0 y=0 表示掷出硬币的结果为反面。那么, 10 10 次试验的观察结果就是 { y 1 , y 2 , , y 10 } ,    y n { 0 , 1 } \{ y_{1}, y_{2},\cdots,y_{10} \},\ \ y_{n}\in \{0,1\}

\qquad 以第一次投掷为例, y = 1 y=1 表示我们观测到了“硬币出现正面”的结果 ,但是我们只能知道掷出了硬币的正面,并不知道到底是 硬币   B   的正面 \textbf{硬币\ B\ 的正面} (记为 z = 1 z=1 ,还是 硬币   C   的正面 \textbf{硬币\ C\ 的正面} (记为 z = 0 z=0

\qquad 因为选择投掷硬币B、还是硬币C来产生 “观测 y y ” 的结果,是由投掷硬币A的结果 z z 来确定,而这个中间过程无法观测的

  • 如果单独考虑投掷硬币 B   ( z = 1 ) B\ (z=1) ,关于观测变量 y y 的概率是一个条件概率(上图中的第2行),即:

P ( y z = 1 , θ ) = p y ( 1 p ) ( 1 y ) , y { 0 , 1 } \qquad\qquad P(y|z=1,\theta)=p^{y}(1-p)^{(1-y)},\qquad y \in\{0,1\}

  • 如果单独考虑投掷硬币 C   ( z = 0 ) C\ (z=0) ,关于观测变量 y y 的概率也是一个条件概率(上图中的第3行),即:

P ( y z = 0 , θ ) = q y ( 1 q ) ( 1 y ) , y { 0 , 1 } \qquad\qquad P(y|z=0,\theta)=q^{y}(1-q)^{(1-y)},\qquad y \in\{0,1\}

\qquad

2. 引入隐藏变量

\qquad 考虑第 n n 次观测 y n y_{n} 的产生过程:

  • 如果投掷的是硬币 B B (必然伴随着事件 z = 1 z=1 同时发生),关于 y n y_{n} 的概率是联合概率 P ( y n , z = 1 θ ) P(y_{n},z=1|\theta) ,即:

P ( y n , z = 1 θ ) = P ( z = 1 θ ) P ( y n z = 1 , θ ) = π p y n ( 1 p ) ( 1 y n ) \qquad\qquad P(y_{n},z=1|\theta)=P(z=1|\theta)P(y_{n}|z=1,\theta)=\pi p^{y_{n}}(1-p)^{(1-y_{n})}

  • 如果投掷的是硬币 C C (必然伴随着事件 z = 0 z=0 同时发生) ,关于 y n y_{n} 的概率是联合概率 P ( y n , z = 0 θ ) P(y_{n},z=0|\theta) ,即:

P ( y n , z = 0 θ ) = P ( z = 0 θ ) P ( y n z = 0 , θ ) = ( 1 π ) q y n ( 1 q ) ( 1 y n ) \qquad\qquad P(y_{n},z=0|\theta)=P(z=0|\theta)P(y_{n}|z=0,\theta)=(1-\pi)q^{y_{n}}(1-q)^{(1-y_{n})}

\qquad
\qquad 上述过程实际上是,通过引入隐藏变量 z z 来表示掷出硬币 A A 的结果该过程无法被观测

\qquad \longrightarrow z = 1 z=1 表示掷出硬币 A A 正面,选择 硬币 B B 来投掷出 y y 的结果,概率为 P ( y , z = 1 θ ) P(y,z=1|\theta)

\qquad \longrightarrow z = 0 z=0 表示掷出硬币 A A 反面,选择 硬币 C C 来投掷出 y y 的结果,概率为 P ( y , z = 0 θ ) P(y,z=0|\theta)

\qquad
\qquad 引入隐藏变量 z z 之后, 虽然我们观测到的 N N 次独立试验的结果为 { y 1 , y 2 , , y N } ,    y n { 0 , 1 } \{ y_{1}, y_{2},\cdots,y_{N}\},\ \ y_{n}\in \{0,1\} ,然而实际上是 { ( y 1 , z 1 ) , ( y 2 , z 1 ) , ( y 3 , z 0 ) , ( y 4 , z 1 ) , ( y 5 , z 0 ) , ( y 6 , z 0 ) , , ( y N , z 1 ) } \{(y_{1},z_{1}), (y_{2},z_{1}), (y_{3},z_{0}),(y_{4},z_{1}),(y_{5},z_{0}),(y_{6},z_{0}),\cdots,(y_{N},z_{1}) \} ,为了描述方便,此处(与下文的表示方法不同)采用 z 0 z_{0} 表示 z = 0 z=0 (掷出硬币A的反面,选择硬币C,并掷出了 y n y_{n} 的结果),用 z 1 z_{1} 表示 z = 1 z=1 (掷出硬币A的正面,选择硬币B,并掷出了 y n y_{n} 的结果)。

采用 1 o f K 1-of-K 表示法描述该过程,更便于进行 E M EM 算法的推导。

\qquad

3. 求complete data的最大似然解

\qquad 综上所述:每观测到一枚硬币的投掷结果( y = 1 y=1 表示出现正面, y = 0 y=0 表示出现反面),观测变量 y y 的概率 P ( y θ ) P(y|\theta) 表示为:

P ( y θ ) = z { 0 , 1 } P ( y , z θ ) = z { 0 , 1 } P ( z θ ) P ( y z , θ ) = P ( z = 1 ) P ( y z = 1 , θ ) + P ( z = 0 ) P ( y z = 0 , θ ) = π P ( y z = 1 , θ ) + ( 1 π ) P ( y z = 0 , θ ) = π p y ( 1 p ) ( 1 y ) + ( 1 π ) q y ( 1 q ) ( 1 y ) \qquad\qquad\begin{aligned} P(y|\theta) &= \displaystyle\sum_{z\in\{0,1\}}P(y,z|\theta) \\ &=\displaystyle\sum_{z\in\{0,1\}}P(z|\theta)P(y|z,\theta) \\ &= P(z=1)P(y|z=1,\theta)+P(z=0)P(y|z=0,\theta) \\ &= \pi\cdot P(y|z=1,\theta)+(1- \pi) \cdot P(y|z=0,\theta) \\ &= \pi p^{y}(1-p)^{(1-y)}+(1- \pi)q^{y}(1-q)^{(1-y)} \end{aligned} \qquad \qquad

\qquad 当进行了 N N ( N = 10 ) (N=10) 独立试验的观测结果为 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,1 时, 观测数据可以用 随机向量 y \boldsymbol{y} 表示为 y = { y 1 , y 2 , , y N } ,    y n { 0 , 1 } \boldsymbol{y} =\{ y_{1}, y_{2},\cdots,y_{N} \},\ \ y_{n}\in \{0,1\} 未观测数据随机向量 z \boldsymbol{z} 表示为 z = { z 1 , z 2 , , z N } ,    z n { 0 , 1 } \boldsymbol{z} =\{ z_{1}, z_{2},\cdots,z_{N} \},\ \ z_{n}\in \{0,1\}

\qquad 由于每次观测过程是独立的,关于所有观测数据 y = { y 1 , y 2 , , y N } \boldsymbol{y} = \{ y_{1}, y_{2},\cdots,y_{N} \} 似然函数 ( l i k e l i h o o d   f u n c t i o n ) (likelihood\ function) 可以写为:

P ( y θ ) = P ( y 1 , y 2 , , y N θ )      y n { 0 , 1 } = n = 1 N P ( y n θ ) = n = 1 N { z n { 0 , 1 } P ( y n , z n θ ) } = n = 1 N { z n { 0 , 1 } P ( z n θ ) P ( y n z n , θ ) }     z n { 0 , 1 } = n = 1 N [ π p y n ( 1 p ) ( 1 y n ) + ( 1 π ) q y n ( 1 q ) ( 1 y n ) ] \qquad\qquad\begin{aligned} P(\boldsymbol{y}|\theta) &= P( y_{1}, y_{2},\cdots,y_{N}|\theta)\qquad\qquad\qquad\qquad\ \ \ \ y_{n}\in \{0,1\}\\ &= \prod_{n=1}^{N}P(y_{n}|\theta) \\ &= \prod_{n=1}^{N}\left\{ \displaystyle\sum_{z_{n}\in\{0,1\}}P(y_{n},z_{n}|\theta) \right\} \\ &= \prod_{n=1}^{N}\left\{ \displaystyle\sum_{z_{n}\in\{0,1\}}P(z_{n}|\theta)P( y_{n}|z_{n},\theta)\right\} \qquad\ \ \ z_{n}\in\{0,1\} \\ &= \prod_{n=1}^{N} \left [ \pi p^{y_{n}}(1-p)^{(1-y_{n})}+(1- \pi)q^{y_{n}}(1-q)^{(1-y_{n})} \right] \\ \end{aligned} \qquad \qquad

\qquad 模型 P ( y θ ) P(\boldsymbol{y}|\theta) 的最大似然解,即: θ ^ = arg max θ P ( y θ ) \hat{\theta} =\argmax_\theta P(\boldsymbol{y}|\theta) θ ^ = arg max θ {   ln P ( y θ )   } \hat{\theta} =\argmax_\theta \left\{ \ \ln P(\boldsymbol{y}|\theta)\ \right\}
\qquad

  • 假设硬币A的投掷过程可以观测的,即 z z 也可观测的(不再是隐藏变量了,如下图所示),我们知道所有的 ( y n , z n ) (y_{n},z_{n}) 的情况。例如,第 i i 次的观测值 ( y i = 1 , z i = 1 ) (y_{i}=1,z_{i}=1) ,我们也知道这个正面朝上的观测结果是用硬币 B 投掷的结果,也就是第 i i 次投掷硬币A的结果为正面朝上;如果第 j j 次的观测值 ( y j = 0 , z j = 0 ) (y_{j}=0,z_{j}=0) ,我们也知道这个反面朝上的观测结果是用硬币 C 投掷出来的,也就是第 j j 次投掷硬币A的结果为反面朝上。以下图为例:
    在这里插入图片描述

Revised Figure.1 From《What is the expectation maximization algorithm》
图中的每个 HT 的结果都可以明确地写成 ( y n , z n ) (y_{n},z_{n}) 的形式,例如第 1 行的数据可以表示为 { ( y 1 = 1 , z 1 = 1 ) , ( y 2 = 0 , z 2 = 0 ) , , ( y 10 = 1 , z 10 = 1 ) } \{ (y_{1}=1,z_{1}=1),(y_{2}=0,z_{2}=0),\cdots,(y_{10}=1,z_{10}=1) \}
 
假设 π = 0.5 \pi=0.5 (等概率选择硬币B或者C), p = θ B p=\theta_{B} q = θ C q=\theta_{C} H表示正面T表示反面
其中, θ B \theta_{B} 是“三硬币问题”中硬币B正面的概率 p p θ C \theta_{C} 是“三硬币问题”中硬币C正面的概率 q q

\qquad 上图中所描述的“数据完整 (complete data)”情形,可采用最大似然估计求解,即 θ ^ = arg max θ {   ln P ( y θ )   } θ = ( p , q ) \hat{\theta} =\argmax_\theta \left\{ \ \ln P(\boldsymbol{y}|\theta)\ \right\},\theta=(p,q)

\qquad 上图的实验中已假设 π = 1 2 \pi=\dfrac{1}{2} ,因此:

P ( y θ ) = n = 1 N [ π p y n ( 1 p ) ( 1 y n ) + ( 1 π ) q y n ( 1 q ) ( 1 y n ) ] = n = 1 N [ 1 2 p y n ( 1 p ) ( 1 y n ) + 1 2 q y n ( 1 q ) ( 1 y n ) ] \qquad\qquad\begin{aligned} P(\boldsymbol{y}|\theta) &= \prod_{n=1}^{N} \left [ \pi p^{y_{n}}(1-p)^{(1-y_{n})}+(1- \pi)q^{y_{n}} (1-q)^{(1-y_{n})} \right] \\ &= \prod_{n=1}^{N} \left [ \frac{1}{2} p^{y_{n}}(1-p)^{(1-y_{n})}+\frac{1}{2}q^{y_{n}} (1-q)^{(1-y_{n})} \right]\end{aligned}

\qquad 对数最大似然函数 :

ln P ( y θ ) = n = 1 N ln [ 1 2 p y n ( 1 p ) ( 1 y n ) + 1 2 q y n ( 1 q ) ( 1 y n ) ] = z n = 1 ln [ 1 2 p y n ( 1 p ) ( 1 y n ) ] + z n = 0 ln [ 1 2 q y n ( 1 q ) ( 1 y n ) ] = z n = 1 [ ln 1 2 + y n ln p + ( 1 y n ) ln ( 1 p ) ] + z n = 0 [ ln 1 2 + y n ln q + ( 1 y n ) ln ( 1 q ) ] = z n = 1 ln 1 2 + ln p z n = 1 y n + ln ( 1 p ) z n = 1 ( 1 y n )       + z n = 0 ln 1 2 + ln q z n = 0 y n + ln ( 1 q ) z n = 0 ( 1 y n ) \qquad\qquad\begin{aligned}\ln P(\boldsymbol{y}|\theta)&=\displaystyle\sum_{n=1}^{N}\ln \left [ \frac{1}{2} p^{y_{n}}(1-p)^{(1-y_{n})}+\frac{1}{2}q^{y_{n}} (1-q)^{(1-y_{n})} \right] \\ &= \displaystyle\sum_{z_{n}=1}\ln\left [ \frac{1}{2} p^{y_{n}}(1-p)^{(1-y_{n})} \right]+ \displaystyle\sum_{z_{n}=0}\ln\left [ \frac{1}{2}q^{y_{n}} (1-q)^{(1-y_{n})} \right] \\ &=\displaystyle\sum_{z_{n}=1}\left [\ln \frac{1}{2} +{y_{n}}\ln p+(1-y_{n})\ln(1-p) \right] +\displaystyle\sum_{z_{n}=0}\left [\ln \frac{1}{2} +{y_{n}}\ln q+(1-y_{n})\ln(1-q) \right] \\ &=\displaystyle\sum_{z_{n}=1}\ln \frac{1}{2} +\ln p \cdot \displaystyle\sum_{z_{n}=1}{y_{n}}+\ln(1-p)\cdot \displaystyle\sum_{z_{n}=1}(1-y_{n}) \\ &\ \ \ \ \ +\displaystyle\sum_{z_{n}=0}\ln \frac{1}{2} +\ln q \cdot \displaystyle\sum_{z_{n}=0}{y_{n}}+\ln(1-q) \cdot \displaystyle\sum_{z_{n}=0}(1-y_{n}) \end{aligned}

\qquad 对数最大似然函数 ln P ( y θ ) \ln P(\boldsymbol{y}|\theta) 分别对 p p q q 求偏导:

ln P ( y θ ) p = { z n = 1 ln 1 2 + ln p z n = 1 y n + ln ( 1 p ) z n = 1 ( 1 y n ) } p = z n = 1 y n p z n = 1 ( 1 y n ) 1 p = 0 \qquad\qquad\begin{aligned}\frac{\partial\ln P(\boldsymbol{y}|\theta)}{\partial p}&= \frac{\partial \left\{\sum\limits_{z_{n}=1}\ln \frac{1}{2} +\ln p \cdot \sum\limits_{z_{n}=1}y_{n}+\ln(1-p) \cdot\sum\limits_{z_{n}=1}(1-y_{n})\right\}}{\partial p} \\ &=\frac{\sum\limits_{z_{n}=1}y_{n}}{p}-\frac{\sum\limits_{z_{n}=1}(1-y_{n})}{1-p}=0 \end{aligned}

\qquad\qquad 以及

ln P ( y θ ) q = { z n = 0 ln 1 2 + ln q z n = 0 y n + ln ( 1 q ) z n = 0 ( 1 y n ) } q = z n = 0 y n q z n = 0 ( 1 y n ) 1 q = 0 \qquad\qquad\begin{aligned}\frac{\partial\ln P(\boldsymbol{y}|\theta)}{\partial q}&= \frac{\partial \left\{ \sum\limits_{z_{n}=0}\ln \frac{1}{2} +\ln q \cdot \sum\limits_{z_{n}=0}y_{n}+\ln(1-q)\cdot \sum\limits_{z_{n}=0}(1-y_{n})\right\}}{\partial q} \\ &=\frac{\sum\limits_{z_{n}=0}y_{n}}{q}-\frac{\sum\limits_{z_{n}=0}(1-y_{n})}{1-q}=0 \end{aligned}
\qquad

\qquad 可以得到:

p = z n = 1 y n z n = 1 y n + z n = 1 ( 1 y n ) q = z n = 0 y n z n = 0 y n + z n = 0 ( 1 y n ) \qquad\qquad\begin{aligned}p&= \frac{\sum\limits_{z_{n}=1}y_{n}}{\sum\limits_{z_{n}=1}y_{n}+\sum\limits_{z_{n}=1}(1-y_{n})} \\ q&= \frac{\sum\limits_{z_{n}=0}y_{n}}{\sum\limits_{z_{n}=0}y_{n}+\sum\limits_{z_{n}=0}(1-y_{n})} \end{aligned}
\qquad
\qquad 注意到此处:

  • z n = 1 y n \sum\limits_{z_{n}=1}{y_{n}} 就是选择硬币B ( z n = 1 ) (z_{n}=1) 进行投掷时出现正面 ( y n = 1 ) (y_{n}=1) 的总次数(9), z n = 1 ( 1 y n ) \sum\limits_{z_{n}=1}(1-y_{n}) 是选择硬币B进行投掷时出现反面 ( y n = 0 ) (y_{n}=0) 的总次数(11),即: θ ^ B = p ^ = 9 9 + 11 = 0.45 \hat \theta_{B}=\hat p=\frac{9}{9+11}=0.45

  • z n = 0 y n \sum\limits_{z_{n}=0}{y_{n}} 就是选择硬币C ( z n = 0 ) (z_{n}=0) 进行投掷时出现正面 ( y n = 1 ) (y_{n}=1) 的总次数(24), z n = 0 ( 1 y n ) \sum\limits_{z_{n}=0}(1-y_{n}) 是选择硬币C进行投掷时出现反面 ( y n = 0 ) (y_{n}=0) 的总次数(6),即: θ ^ C = q ^ = 24 24 + 6 = 0.80 \hat \theta_{C}=\hat q=\frac{24}{24+6}=0.80
    \qquad

  • 如果硬币A的投掷过程无法观测,即 z z 是隐藏变量,如下图中的问号,那么就可以采用 E M EM 算法、采用迭代的方式来求取最大似然解。
    \qquad 在这里插入图片描述

Revised Figure.1 From《What is the expectation maximization algorithm》

4. EM算法求imcomplete data的最大似然解

\qquad
\qquad 如果投掷硬币 A 的过程无法观测,那么 z z 就是隐藏变量(上图中的问号表明,我们不知道观测结果到底是用 硬币B 还是 硬币C 投掷出来的),只能观测到 “硬币B 硬币C” 的投掷结果为正面反面

\qquad EM 算法的基本思路

  • 既然有关 Z = { z 1 , , z N } , z n { 0 , 1 } \bold Z=\{ z_{1},\cdots,z_{N}\},z_{n} \in \{0,1\} 的信息无法观测,造成了观测数据的不完整(imcomplete),那么就需要先假设一个有关 Z \bold Z 的信息(也就是去猜测一下,假设已经知道上图中的问号到底是B还是C,只是未必是正确的),这样观测数据 ( Y , Z ) (\bold Y, \bold Z) 就完整(complete)了,才可以像1.3节那样去进行最大似然估计。
  • 然而,怎么样的猜测(图中的问号哪些为B、哪些为C,即 Z \bold Z 的值)才是比较可靠的?
    • 为了计算观测数据 ( Y , Z ) (\bold Y, \bold Z) 的似然值,需要先假设模型参数的初始值 θ ( i ) = ( π ( i ) , p ( i ) , q ( i ) ) \theta^{(i)}=(\pi^{(i)},p^{(i)},q^{(i)})
    • E M EM 算法就是通过计算 Z \bold Z 的期望值 E P ( Z Y , θ ) [ ln P ( Y , Z θ )     Y , θ ( i ) ] E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}] 作为对猜测 Z \bold Z 的估计,即:
    1. 计算 Q ( θ , θ ( i ) ) = E P ( Z Y , θ ) [ ln P ( Y , Z θ )     Y , θ ( i ) ] Q(\theta,\theta^{(i)})=E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}] ,也就是 E步
    2. 求使 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 最大化的 θ \theta ,即: θ ( i + 1 ) = arg max θ   Q ( θ , θ ( i ) ) \theta^{(i+1)}=\argmax_{\theta}\ Q(\theta,\theta^{(i)}) ,也就是 M步

\qquad 三硬币问题的EM算法步骤如下:

( 1 )   E \qquad (1)\ E 步:模型参数为 θ ( i ) = ( π ( i ) , p ( i ) , q ( i ) ) \theta^{(i)}=(\pi^{(i)},p^{(i)},q^{(i)})

\qquad\qquad 观测数据 y n y_{n} 来自掷硬币 B B 的概率为:

\qquad\qquad\qquad μ n ( i + 1 ) = π ( i ) ( p ( i ) ) y n ( 1 p ( i ) ) ( 1 y n ) π ( i ) ( p ( i ) ) y n ( 1 p ( i ) ) ( 1 y n ) + ( 1 π ( i ) ) ( q ( i ) ) y n ( 1 q ( i ) ) ( 1 y n ) \mu_{n}^{(i+1)}= \dfrac{\pi^{(i)}(p^{(i)})^{y_{n}}(1-p^{(i)})^{(1-y_{n})}} {\pi^{(i)}(p^{(i)})^{y_{n}}(1-p^{(i)})^{(1-y_{n})}+(1-\pi^{(i)})(q^{(i)})^{y_{n}}(1-q^{(i)})^{(1-y_{n})}}

\qquad\qquad 观测数据 y n y_{n} 来自掷硬币 C C 的概率为:

\qquad\qquad\qquad 1 μ n ( i + 1 ) = ( 1 π ( i ) ) ( q ( i ) ) y n ( 1 q ( i ) ) ( 1 y n ) π ( i ) ( p ( i ) ) y n ( 1 p ( i ) ) ( 1 y n ) + ( 1 π ( i ) ) ( q ( i ) ) y n ( 1 q ( i ) ) ( 1 y n ) 1-\mu_{n}^{(i+1)}= \dfrac{(1-\pi^{(i)})(q^{(i)})^{y_{n}}(1-q^{(i)})^{(1-y_{n})}} {\pi^{(i)}(p^{(i)})^{y_{n}}(1-p^{(i)})^{(1-y_{n})}+(1-\pi^{(i)})(q^{(i)})^{y_{n}}(1-q^{(i)})^{(1-y_{n})}}

\qquad
( 2 )   M \qquad (2)\ M 步:更新模型参数值 θ ( i + 1 ) = ( π ( i + 1 ) , p ( i + 1 ) , q ( i + 1 ) ) \theta^{(i+1)}=(\pi^{(i+1)},p^{(i+1)},q^{(i+1)})

π ( i + 1 ) = 1 N n = 1 N μ n ( i + 1 )   ,   p ( i + 1 ) = n = 1 N μ n ( i + 1 ) y n n = 1 N μ n ( i + 1 )   ,   q ( i + 1 ) = n = 1 N ( 1 μ n ( i + 1 ) ) y n n = 1 N ( 1 μ n ( i + 1 ) ) \qquad\qquad\qquad \pi^{(i+1)} =\frac{1}{N} \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)}\ ,\ p^{(i+1)} =\frac{\sum\limits_{n=1}^{N} \mu_{n}^{(i+1)}y_{n}}{\sum\limits_{n=1}^{N} \mu_{n}^{(i+1)}} \ , \ q^{(i+1)} =\frac{\sum\limits_{n=1}^{N} (1-\mu_{n}^{(i+1)})y_{n}}{\sum\limits_{n=1}^{N} (1-\mu_{n}^{(i+1)})}

\qquad

5. 三硬币问题EM公式的推导过程

\qquad 仍然假设三硬币问题概率模型的参数为 θ = ( π , p , q ) \theta=(\pi,p,q)

(1) 三硬币问题的 1-of-K 表示

1 ) \qquad1) 隐藏变量 z 1 = 1 z_{1}=1 ,即隐藏向量 z = [ 1 , 0 ] T \bold z=[1,0]^{T} ,表示 事件“硬币 A 正面
\qquad   该事件的概率为 P ( z 1 = 1 θ ) = π P(z_{1}=1|\theta)=\pi

2 ) \qquad2) 隐藏变量 z 2 = 1 z_{2}=1 ,即隐藏向量 z = [ 0 , 1 ] T \bold z=[0,1]^{T} ,表示 事件“硬币 A 反面
\qquad   该事件的概率为 P ( z 2 = 1 θ ) = 1 π P(z_{2}=1|\theta)=1-\pi

3 ) \qquad3)   P ( z 1 = 1 θ ) = π 1 = π \ P(z_{1}=1|\theta)=\pi_{1}=\pi
\qquad      P ( z 2 = 1 θ ) = π 2 = 1 π \ P(z_{2}=1|\theta)=\pi_{2}=1-\pi

\qquad  则投掷硬币 A 的概率可以统一描述为: P ( z k = 1 θ ) = π k   ,     k { 1 , 2 } P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2\}
\qquad   
4 ) \qquad4) 用隐藏向量 z \mathbf z 表示,也就是:

\qquad\qquad    P ( z θ ) = k = 1 2 π k z k = π 1 z 1 π 2 z 2    , z = [ z 1 z 2 ] { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{2} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\ \ ,\qquad \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned}

(2) 引入隐藏向量 z \mathbf z 表示观测值 y y 的概率

1 ) \qquad1) 单独考虑硬币 B 和硬币 C 关于观测值 y { 0 , 1 } y \in \{0,1\} 的概率:

\qquad   硬币 B 的概率: P ( y z 1 = 1 , θ ) = p y ( 1 p ) ( 1 y ) P(y|z_{1}=1,\theta)=p^{y}(1-p)^{(1-y)}
\qquad   硬币 C 的概率: P ( y z 2 = 1 , θ ) = q y ( 1 q ) ( 1 y ) P(y|z_{2}=1,\theta)=q^{y}(1-q)^{(1-y)}

2 ) \qquad2) 为了数学上的描述方便,记 α 1 = p \alpha_{1}=p 以及 α 2 = q \alpha_{2}=q ,则:

\qquad   硬币 B 的概率: P ( y z 1 = 1 , θ ) = p y ( 1 p ) ( 1 y ) = α 1 y ( 1 α 1 ) ( 1 y ) P(y|z_{1}=1,\theta)=p^{y}(1-p)^{(1-y)}=\alpha_{1}^{y}(1-\alpha_{1})^{(1-y)}
\qquad   硬币 C 的概率: P ( y z 2 = 1 , θ ) = q y ( 1 q ) ( 1 y ) = α 2 y ( 1 α 2 ) ( 1 y ) P(y|z_{2}=1,\theta)=q^{y}(1-q)^{(1-y)}=\alpha_{2}^{y}(1-\alpha_{2})^{(1-y)}
 
\qquad  于是,硬币 B 和C 的概率可以统一描述为:
\qquad
\qquad\qquad   P ( y z k = 1 , θ ) = α k y ( 1 α k ) ( 1 y ) ,    k { 1 , 2 } ,   y { 0 , 1 } P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{(1-y)},\ \ k \in \{1,2\},\ y \in \{0,1\}
\qquad
3 ) \qquad3) 用隐藏向量 z \mathbf z 表示,也就是:
\qquad
\qquad\qquad   P ( y z , θ ) = k = 1 2 [   P ( y z k = 1 , θ )   ] z k = [ P ( y z 1 = 1 , θ ) ] z 1 [ P ( y z 2 = 1 , θ ) ] z 2 = [   α 1 y ( 1 α 1 ) ( 1 y )   ] z 1 [   α 2 y ( 1 α 2 ) ( 1 y )   ] z 2 = k = 1 2 [   α k y ( 1 α k ) ( 1 y )   ] z k ,     z = [ z 1 z 2 ] { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(y|\mathbf z,\theta) &= \prod_{k=1}^{2} \left[\ P(y|z_{k}=1,\theta)\ \right]^{z_{k}} \\ &=\left[P(y|z_{1}=1,\theta)\right]^{z_{1}}\cdot\left[P(y|z_{2}=1,\theta)\right]^{z_{2}} \\ &=\left[\ \alpha_{1}^{y}(1-\alpha_{1})^{(1-y)}\ \right]^{z_{1}}\cdot\left[\ \alpha_{2}^{y}(1-\alpha_{2})^{(1-y)}\ \right]^{z_{2}} \\ &= \prod_{k=1}^{2} \left[\ \alpha_{k}^{y}(1-\alpha_{k})^{(1-y)}\ \right]^{z_{k}},\ \ \ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned}
\qquad

(3) imcomlete data 样本集的似然函数

\qquad 对于所有样本集 Y = { y 1 , , y N } \mathbf Y=\{y_{1},\cdots,y_{N}\} ,对应的隐藏向量集 Z = { z 1 , , z N } ,   z n = [ z n 1 z n 2 ] { [ 1 0 ] , [ 0 1 ] } \mathbf Z=\{ \mathbf z_{1},\cdots,\mathbf z_{N}\},\ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} ,就有:

\qquad\qquad P ( Y Z , θ ) = P ( y 1 , , y N z 1 , , z N , θ ) = n = 1 N P ( y n z n , θ )   P ( y z , θ ) = k = 1 2 [   P ( y z k = 1 , θ )   ] z k = n = 1 N k = 1 2 {   P ( y n z n k = 1 , θ )   } z n k = n = 1 N k = 1 2 {   α k y n ( 1 α k ) 1 y n   } z n k \begin{aligned} P(\mathbf Y|\mathbf Z,\theta) &= P(y_{1},\cdots,y_{N}|\mathbf z_{1},\cdots,\mathbf z_{N},\theta) \\ &= \prod_{n=1}^{N} P(y_{n}|\mathbf z_{n},\theta)\qquad\qquad\qquad\ 由 P(y|\mathbf z,\theta) = \prod_{k=1}^{2} \left[\ P(y|z_{k}=1,\theta)\ \right]^{z_{k}} \\ &= \prod_{n=1}^{N}\prod_{k=1}^{2} \left\{\ P(y_{n}|z_{nk}=1,\theta)\ \right\}^{z_{nk}} \\ &= \prod_{n=1}^{N}\prod_{k=1}^{2} \{\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \}^{z_{nk}} \\ \end{aligned}

\qquad 又由
\qquad\qquad P ( z n θ ) = k = 1 2 π k z n k = π 1 z n 1 π 2 z n 2    ,       z n = [ z n 1 z n 2 ] { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf z_{n}|\theta) &= \prod_{k=1}^{2} \pi_{k}^{z_{nk}} =\pi_{1}^{z_{n1}}\cdot\pi_{2}^{z_{n2}}\ \ ,\ \ \ \ \ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned}

\qquad 从而有

\qquad\qquad P ( Z θ ) = P ( z 1 , , z N θ ) = n = 1 N P ( z n θ ) = n = 1 N k = 1 2 π k z n k = n = 1 N π 1 z n 1 π 2 z n 2    ,       z n = [ z n 1 z n 2 ] { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf Z|\theta) &= P(\mathbf z_{1},\cdots,\mathbf z_{N}|\theta) \\ &=\prod_{n=1}^{N}P(\mathbf z_{n}|\theta) \\ &=\prod_{n=1}^{N}\prod_{k=1}^{2} \pi_{k}^{z_{nk}} \\ &=\prod_{n=1}^{N}\pi_{1}^{z_{n1}}\cdot\pi_{2}^{z_{n2}}\ \ ,\ \ \ \ \ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned}

\qquad 由于隐藏向量集 Z \mathbf Z 无法观测, E M EM 算法首先对所有的隐藏向量 Z = { z 1 , , z n , , z N } \mathbf Z=\{\mathbf z_{1},\cdots,\mathbf z_{n},\cdots,\mathbf z_{N}\} 进行猜测,使得观测数据从 imcomplete 形式 Y \mathbf Y 变成 complete 形式 ( Y , Z ) (\mathbf Y,\mathbf Z)
\qquad
\qquad 此时,complete data数据 ( Y , Z ) (\mathbf Y,\mathbf Z) 似然函数(如果 Z \mathbf Z 已知)为:
\qquad
\qquad\qquad P ( Y , Z θ ) = P ( Y Z , θ ) P ( Z θ ) = n = 1 N k = 1 2 {   P ( y n z n k = 1 , θ )   } z n k n = 1 N k = 1 2 π k z n k = n = 1 N k = 1 2 π k z n k {   P ( y n z n k = 1 )   } z n k = n = 1 N k = 1 2 π k z n k {   α k y n ( 1 α k ) 1 y n   } z n k \begin{aligned} P(\mathbf Y,\mathbf Z|\theta) &=P(\mathbf Y|\mathbf Z,\theta)P(\mathbf Z|\theta) \\ &=\prod_{n=1}^{N}\prod_{k=1}^{2} \left\{\ P(y_{n}|z_{nk}=1,\theta)\ \right\}^{z_{nk}}\cdot \prod_{n=1}^{N}\prod_{k=1}^{2} \pi_{k}^{z_{nk}} \\ &=\prod_{n=1}^{N}\prod_{k=1}^{2} \pi_{k}^{z_{nk}} \left\{\ P(y_{n}|z_{nk}=1)\ \right\}^{z_{nk}} \\ &=\prod_{n=1}^{N}\prod_{k=1}^{2} \pi_{k}^{z_{nk}} \left\{\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right\}^{z_{nk}} \end{aligned}

\qquad 可得到对数似然函数(其值取决于 Z \mathbf Z ):

\qquad\qquad ln P ( Y , Z θ ) = n = 1 N k = 1 2 z n k {   ln π k + ln P ( y n z n k = 1 )   } = n = 1 N k = 1 2 z n k {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } \begin{aligned} \ln P(\mathbf Y,\mathbf Z|\theta) &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{nk} \left\{\ \ln\pi_{k}+\ln P(y_{n}|z_{nk}=1)\ \right\} \\ &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{nk} \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\} \\ \end{aligned}

假设参数 θ = ( π , α 1 , α 2 ) \theta=(\pi,\alpha_{1},\alpha_{2}) 是已知的,对数似然函数 ln P ( Y , Z θ ) \ln P(\mathbf Y,\mathbf Z|\theta) 的值取决于 Z = { z 1 , , z n , , z N } \mathbf Z=\{\mathbf z_{1},\cdots,\mathbf z_{n},\cdots,\mathbf z_{N}\}
(1)如果我们都很明确地知道每一次投掷用的是硬币B还是硬币C,也就是知道所有的 z n = [ 1 , 0 ] T \mathbf z_{n}=[1,0]^{T} (用硬币 B 投掷) 还是 z n = [ 0 , 1 ] T \mathbf z_{n}=[0,1]^{T} (用硬币 C 投掷),那么对数似然函数 ln P ( Y , Z θ ) \ln P(\mathbf Y,\mathbf Z|\theta) 的值可以明确计算出
(2)如果我们不清楚每一次投掷用的是硬币B还是硬币C,也就是不知道 z n \mathbf z_{n} 的取值到底是 [ 1 , 0 ] T [1,0]^{T} 还是 [ 0 , 1 ] T [0,1]^{T} ,那么对数似然函数 ln P ( Y , Z θ ) \ln P(\mathbf Y,\mathbf Z|\theta) 的值无法确定

(4) 计算对数似然函数 ln P ( Y , Z θ ) \ln P(\mathbf Y,\mathbf Z|\theta) 的期望

\qquad 由于隐藏向量 Z \mathbf Z 的值无法观测,导致无法采用最大似然估计计算参数 θ = ( π , α 1 , α 2 ) \theta=(\pi,\alpha_{1},\alpha_{2}) 的值。为了解决这个问题, E M EM 算法对所有的隐藏变量 Z \mathbf Z 进行了合理猜测,实际上就是计算其期望 E Z [ Z ] E_{Z}[Z]

1 ) \qquad1) 由于观测数据 Y \mathbf Y 是已知的,在假定了一个初始值 θ ( i ) = ( π ( i ) , p ( i ) , q ( i ) ) \theta^{(i)}=(\pi^{(i)},p^{(i)},q^{(i)}) 之后,就可以计算出任意一种猜测 Z \mathbf Z 时的(对数)似然值 ln P ( Y , Z θ ) \ln P(\mathbf Y,\mathbf Z|\theta)

2 ) \qquad2) 要知道什么样的猜测 Z \mathbf Z 比较可靠,是通过“求 ln P ( Y , Z θ ) \ln P(\mathbf Y,\mathbf Z|\theta) 的期望”的方式来评估,也就是计算:
\qquad
\qquad\qquad Q ( θ , θ ( i ) ) = E P ( Z Y , θ ) [ ln P ( Y , Z θ )     Y , θ ( i ) ] = E P ( Z Y , θ ) { n = 1 N k = 1 2 z n k {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } } = n = 1 N k = 1 2 E P ( Z Y , θ ) [ z n k ] {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } \begin{aligned}Q(\theta,\theta^{(i)})&=E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}] \\ &= E_{P(\bold Z|\bold Y,\theta)}\left\{\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{nk} \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\}\right\} \\ &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\} \\ \end{aligned}

\qquad    这里的 E P ( Z Y , θ ) [ z n k ] E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] 就是对 z n \mathbf z_{n} 求期望。
\qquad

  • 如果关于 Z \mathbf Z 的信息是已知的,也就是明确知道每一个 z n = [ z n 1 z n 2 ] { [ 1 0 ] , [ 0 1 ] } \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} ,或者说明确知道 z n k = 1 z_{nk}=1 中的 k k 为1还是2。这时,对数似然函数可以写为:

\qquad\qquad ln P ( Y , Z θ ) = n = 1 N k = 1 2 z n k {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } \ln P(\bold Y, \bold Z|\theta)=\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{nk} \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\}

\qquad
\qquad 对于观测数据 Y = { y 1 , , y n , , y N } \bold Y=\{y_{1},\cdots,y_{n},\cdots,y_{N} \} ,其中任意一个 y n y_{n} 的值,或者是由 硬币B 投掷的(记为 C 1 = { n     z n 1 = 1 } C_{1}=\{n\ |\ z_{n1}=1\} ),或者是由 硬币C 投掷的(记为 C 2 = { n     z n 2 = 1 } C_{2}=\{n\ |\ z_{n2}=1\} ),显然有 Y = C 1 C 2 \bold Y= C_{1} \cup C_{2} C 1 C 2 = C_{1} \cap C_{2}=\varnothing ,对数似然函数实际上可表达为:

\qquad\qquad ln P ( Y , Z θ ) = n = 1 N k = 1 2 z n k {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } = n C 1 z n 1 {   ln π 1 + ln [   α 1 y n ( 1 α 1 ) 1 y n   ]   }     + n C 2 z n 2 {   ln π 2 + ln [   α 2 y n ( 1 α 2 ) 1 y n   ]   } = n C 1 {   ln π + ln [   p y n ( 1 p ) 1 y n   ]   }     + n C 2 {   ln ( 1 π ) + ln [   q y n ( 1 q ) 1 y n   ]   } \begin{aligned}\ln P(\bold Y, \bold Z|\theta)&=\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{nk} \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\}\\ &=\displaystyle\sum_{n\in C_{1}}z_{n1} \left\{\ \ln\pi_{1}+\ln \left[\ \alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\ \right]\ \right\}\\ &\ \ \ +\displaystyle\sum_{n\in C_{2}}z_{n2} \left\{\ \ln\pi_{2}+\ln \left[\ \alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\ \right]\ \right\}\\ &=\displaystyle\sum_{n\in C_{1}} \left\{\ \ln\pi+\ln \left[\ p^{y_{n}}(1-p)^{1-y_{n}}\ \right]\ \right\}\\ &\ \ \ +\displaystyle\sum_{n\in C_{2}} \left\{\ \ln(1-\pi)+\ln \left[\ q^{y_{n}}(1-q)^{1-y_{n}}\ \right]\ \right\}\\ \end{aligned}

\qquad 这时,就可以通过“最大似然估计”求出参数 θ ^ = ( π , p , q ) \hat \theta=(\pi,p,q) 的值。
\qquad\qquad

  • 如果关于 Z \mathbf Z 的信息都是未知的,所有隐藏变量 z 1 , , z N \mathbf z_{1},\cdots,\mathbf z_{N} 都是未知的,也就是不知道每一个 z n k = 1 z_{nk}=1 还是 z n k = 0 z_{nk}=0

\qquad 计算 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 中的 E P ( Z Y , θ ) [ z n k ] E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] 需要用到概率分布 P ( z n y n , θ ( i ) ) P(\mathbf z_{n}|y_{n},\theta^{(i)}) ,表示在已知观测数据 y n y_{n} 初始值 θ ( i ) \theta^{(i)} 的条件下,对隐藏变量 z n \mathbf z_{n} 的概率,也就是 P ( z n 1 = 1 y n , θ ) P(z_{n1}=1|y_{n},\theta) P ( z n 2 = 1 y n , θ ) P(z_{n2}=1|y_{n},\theta) ,因此有:

\qquad\qquad E P ( Z Y , θ ) [ z n k ] = z n k { 0 , 1 } z n k P ( z n y n , θ ( i ) ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|y_{n},\theta^{(i)}) \\ \end{aligned}

\qquad\qquad P ( z n y n , θ ) = P ( y n , z n θ ) P ( y n θ ) = P ( y n z n , θ ) P ( z n θ ) z n P ( y n z n , θ ) P ( z n θ )   , z n = [ z n 1 z n 2 ] { [ 1 0 ] , [ 0 1 ] } = P ( y n z n k = 1 , θ ) P ( z n k = 1 θ ) j = 1 2 P ( y n z n j = 1 , θ ) P ( z n j = 1 θ )   = P ( y n z n k = 1 , θ ) P ( z n k = 1 θ ) P ( y n z n 1 = 1 , θ ) P ( z n 1 = 1 θ ) + P ( y n z n 2 = 1 , θ ) P ( z n 2 = 1 θ )   = P ( y n z n k = 1 , θ ) P ( z n k = 1 θ ) α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 , k { 1 , 2 } \begin{aligned}P(\mathbf z_{n}|y_{n},\theta) &=\frac{P(y_{n},\mathbf z_{n}|\theta)}{P(y_{n}|\theta)} \\ &=\frac{P(y_{n}|\mathbf z_{n},\theta)P(\mathbf z_{n}|\theta)}{\sum\limits_{\bold z_{n}}P(y_{n}|\mathbf z_{n},\theta)P(\mathbf z_{n}|\theta)}\ ,\qquad\qquad\mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\}\\ &=\frac{P(y_{n}|z_{nk}=1,\theta)P(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{2}P(y_{n}|z_{nj}=1,\theta)P(z_{nj}=1|\theta)}\ \\ &=\frac{P(y_{n}|z_{nk}=1,\theta)P(z_{nk}=1|\theta)}{P(y_{n}|z_{n1}=1,\theta)P(z_{n1}=1|\theta)+P(y_{n}|z_{n2}=1,\theta)P(z_{n2}=1|\theta)}\ \\ &=\frac{P(y_{n}|z_{nk}=1,\theta)P(z_{nk}=1|\theta)}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}} \pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}} \pi_{2}},\qquad\qquad k\in\{1,2\} \\ \end{aligned}
\qquad
\qquad
\qquad z n = [ z n 1 z n 2 ] { [ 1 0 ] , [ 0 1 ] } \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} ,也就是 z n k { 0 , 1 } z_{nk}\in\{0,1\} ,从而有:
\qquad
\qquad\qquad E P ( Z Y , θ ) [ z n k ] = z n k { 0 , 1 } z n k P ( z n y n , θ ( i ) ) = 1 P ( y n z n k = 1 , θ ( i ) ) P ( z n k = 1 θ ( i ) ) + 0 P ( y n z n k = 0 , θ ( i ) ) P ( z n k = 0 θ ( i ) ) j = 1 2 P ( y n z n j = 1 , θ ( i ) ) P ( z n j = 1 θ ( i ) ) = P ( y n z n k = 1 , θ ( i ) ) P ( z n k = 1 θ ( i ) ) α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 = k = 1 2 [ α k y n ( 1 α k ) 1 y n ] z n k k = 1 2 π k z n k α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|y_{n},\theta^{(i)})\\ &=\frac{1\cdot P(y_{n}|z_{nk}=1,\theta^{(i)})P(z_{nk}=1|\theta^{(i)})+0\cdot P(y_{n}|z_{nk}=0,\theta^{(i)})P(z_{nk}=0|\theta^{(i)})}{\sum\limits_{j=1}^{2}P(y_{n}|z_{nj}=1,\theta^{(i)})P(z_{nj}=1|\theta^{(i)})} \\ &=\frac{P(y_{n}|z_{nk}=1,\theta^{(i)})P(z_{nk}=1|\theta^{(i)})}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}} \pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}} \pi_{2}} \\ &=\frac{\prod_{k=1}^{2}\left[\alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\right]^{z_{nk}} \prod_{k=1}^{2}\pi_{k}^{z_{nk}}}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}} \pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}} \pi_{2}} \\ \end{aligned}
\qquad
\qquad 上述过程,实际上就是在计算 ln P ( Y , Z θ ) = n = 1 N k = 1 2 z n k {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } \ln P(\bold Y, \bold Z|\theta) =\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{nk} \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\} 这个似然值的时候,由于 z n k z_{nk} 是未知的隐藏变量(无法观测),在初始值 θ ( i ) \theta^{(i)} 的条件下,用 z n k z_{nk} 的期望值 E P ( Z Y , θ ) [ z n k ] E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] 去代替上式中的 z n k z_{nk} ,来完成似然值的计算,即:

\qquad\qquad Q ( θ , θ ( i ) ) = n = 1 N k = 1 2 E P ( Z Y , θ ) [ z n k ] {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } \begin{aligned}Q(\theta,\theta^{(i)}) &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\} \\ \end{aligned}

\qquad\qquad 其中, E P ( Z Y , θ ) [ z n k ] = P ( y n z n k = 1 , θ ( i ) ) P ( z n k = 1 θ ( i ) ) α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 E_{P(\bold Z|\bold Y,\theta)}[z_{nk}]=\dfrac{P(y_{n}|z_{nk}=1,\theta^{(i)})P(z_{nk}=1|\theta^{(i)})}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}} \pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}} \pi_{2}}
\qquad

(5) 求 E   ( E x p e c t a t i o n ) E\ (Expectation) 步的公式

\qquad 由于 α 1 = p ( i ) , α 2 = q ( i ) , π 1 = π ( i ) , π 2 = 1 π ( i ) \alpha_{1}=p^{(i)},\alpha_{2}=q^{(i)},\pi_{1}=\pi^{(i)},\pi_{2}=1-\pi^{(i)} ,替换之后就可以得到 E E 步的公式:

\qquad\qquad E P ( Z Y , θ ) [ z n k ] = k = 1 2 [ α k y n ( 1 α k ) 1 y n ] z n k k = 1 2 π k z n k α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] &=\frac{\prod_{k=1}^{2}\left[\alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\right]^{z_{nk}} \prod_{k=1}^{2}\pi_{k}^{z_{nk}}}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}} \pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}} \pi_{2}} \\ \end{aligned}

  • 如果观测数据 y n y_{n} 是由投掷硬币 B B 所产生,即 z n = [ 1 , 0 ] T \mathbf z_{n}=[1,0]^{T} ,也就是 z n 1 = 1 , z n 2 = 0 z_{n1}=1,z_{n2}=0 ,则:

\qquad\qquad E P ( Z Y , θ ) [ z n k ] = E P ( Z Y , θ ) [ z n 1 = 1 ] = α 1 y n ( 1 α 1 ) 1 y n π 1 α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 = π ( i ) ( p ( i ) ) y n ( 1 p ( i ) ) 1 y n π ( i ) ( p ( i ) ) y n ( 1 p ( i ) ) 1 y n + ( 1 π ( i ) ) ( q ( i ) ) y n ( 1 q ( i ) ) 1 y n \begin{aligned} E_{P(\bold Z|\bold Y,\theta)}[z_{nk}]&= E_{P(\bold Z|\bold Y,\theta)}[z_{n1}=1]\\ &=\frac{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\cdot\pi_{1}}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\cdot\pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\cdot\pi_{2}} \\ &= \frac{\pi^{(i)} (p^{(i)})^{y_{n}}(1-p^{(i)})^{1-y_{n}}}{\pi^{(i)} (p^{(i)})^{y_{n}}(1-p^{(i)})^{1-y_{n}}+(1-\pi^{(i)}) (q^{(i)})^{y_{n}}(1-q^{(i)})^{1-y_{n}}} \\ \end{aligned}

  • 如果观测数据 y n y_{n} 是由投掷硬币 C C 所产生,即 z n = [ 0 , 1 ] T \mathbf z_{n}=[0,1]^{T} ,也就是 z n 1 = 0 , z n 2 = 1 z_{n1}=0,z_{n2}=1 ,则:

\qquad\qquad E P ( Z Y , θ ) [ z n k ] = E P ( Z Y , θ ) [ z n 2 = 1 ] = α 2 y n ( 1 α 2 ) 1 y n π 2 α 1 y n ( 1 α 1 ) 1 y n π 1 + α 2 y n ( 1 α 2 ) 1 y n π 2 = ( 1 π ( i ) ) ( q ( i ) ) y n ( 1 q ( i ) ) 1 y n π ( i ) ( p ( i ) ) y n ( 1 p ( i ) ) 1 y n + ( 1 π ( i ) ) ( q ( i ) ) y n ( 1 q ( i ) ) 1 y n \begin{aligned} E_{P(\bold Z|\bold Y,\theta)}[z_{nk}]&= E_{P(\bold Z|\bold Y,\theta)}[z_{n2}=1]\\ &= \frac{\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\cdot\pi_{2}}{\alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\cdot\pi_{1}+\alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\cdot\pi_{2}} \\ &= \frac{(1-\pi^{(i)}) (q^{(i)})^{y_{n}}(1-q^{(i)})^{1-y_{n}}}{\pi^{(i)} (p^{(i)})^{y_{n}}(1-p^{(i)})^{1-y_{n}}+(1-\pi^{(i)}) (q^{(i)})^{y_{n}}(1-q^{(i)})^{1-y_{n}}} \\ \end{aligned}

\qquad
\qquad 若记: μ n ( i + 1 ) = E P ( Z Y , θ ) [ z n 1 = 1 ] \mu_{n}^{(i+1)}=E_{P(\bold Z|\bold Y,\theta)}[z_{n1}=1] ,那么: E P ( Z Y , θ ) [ z n 2 = 1 ] = 1 μ n ( i + 1 ) E_{P(\bold Z|\bold Y,\theta)}[z_{n2}=1]=1-\mu_{n}^{(i+1)}
\qquad

(6) 求 M   ( M a x i m i z a t i o n ) M\ (Maximization) 步的公式

1 ) \qquad1) E E 步求得了一个可靠猜测 Z \mathbf Z 之后,可求出在某个初始化的参数值 θ ( i ) \theta^{(i)} 时的似然值 Q ( θ , θ ( i ) ) = E P ( Z Y , θ ) [ ln P ( Y , Z θ )     Y , θ ( i ) ] Q(\theta,\theta^{(i)})=E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}]

2 ) \qquad2) 此时,如果将 E P ( Z Y , θ ) [ z n k ] E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] 作为隐藏变量 z n \mathbf z_{n} 估计值,将参数 θ \theta 当做未知数,那么就可以对 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 进行最大似然估计,从而得到新的参数值 θ ( i + 1 ) = ( p ( i + 1 ) , q ( i + 1 ) ) \theta^{(i+1)}=(p^{(i+1)},q^{(i+1)}) (也就是 M M 步)

\qquad   即求: θ ( i + 1 ) = arg max θ Q ( θ , θ ( i ) ) \theta^{(i+1)}=\argmax_{\theta} Q(\theta,\theta^{(i)})

3 ) \qquad3) 最大似然估计,对 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 分别求 ( α 1 , α 2 ) (\alpha_{1},\alpha_{2}) 的偏导,也就是对 ( p , q ) (p,q) 求偏导:

\qquad\qquad   Q ( θ , θ ( i ) ) = n = 1 N k = 1 2 E P ( Z Y , θ ) [ z n k ] {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } = n = 1 N E P ( Z Y , θ ) [ z n 1 ] {   ln π 1 + ln [   α 1 y n ( 1 α 1 ) 1 y n   ]   }     + n = 1 N E P ( Z Y , θ ) [ z n 2 ] {   ln π 2 + ln [   α 2 y n ( 1 α 2 ) 1 y n   ]   } = n = 1 N μ n ( i + 1 ) {   ln π 1 + ln [   α 1 y n ( 1 α 1 ) 1 y n   ]   }     + n = 1 N ( 1 μ n ( i + 1 ) ) {   ln π 2 + ln [   α 2 y n ( 1 α 2 ) 1 y n   ]   } \begin{aligned}Q(\theta,\theta^{(i)}) &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\} \\ &=\displaystyle\sum_{n=1}^{N} E_{P(\bold Z|\bold Y,\theta)}[z_{n1}] \left\{\ \ln\pi_{1}+\ln \left[\ \alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\ \right]\ \right\} \\ &\ \ \ +\displaystyle\sum_{n=1}^{N} E_{P(\bold Z|\bold Y,\theta)}[z_{n2}] \left\{\ \ln\pi_{2}+\ln \left[\ \alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\ \right]\ \right\} \\ &=\displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)} \left\{\ \ln\pi_{1}+\ln \left[\ \alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\ \right]\ \right\} \\ &\ \ \ +\displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)}) \left\{\ \ln\pi_{2}+\ln \left[\ \alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\ \right]\ \right\} \\ \end{aligned}
\qquad  
\qquad  从这里其实可以看出 E M EM 算法中的 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 实际上相当于 2 2 个似然值的“混合,而 μ n ( i + 1 ) \mu_{n}^{(i+1)} 1 μ n ( i + 1 ) 1-\mu_{n}^{(i+1)} 加权系数,即 z n \mathbf z_{n} 的估计值为 z ^ n = [ z ^ n 1 , z ^ n 2 ] T = [ μ n ( i + 1 ) , 1 μ n ( i + 1 ) ] T \hat \mathbf z_{n}=[\hat z_{n1},\hat z_{n2}]^{T}=[\mu_{n}^{(i+1)},1-\mu_{n}^{(i+1)}]^{T}

\qquad   ① 对 p = α 1 p=\alpha_{1} 求偏导

\qquad\qquad   Q ( θ , θ ( i ) ) α 1 = { n = 1 N μ n ( i + 1 ) ln [   α 1 y n ( 1 α 1 ) 1 y n   ] } α 1 = n = 1 N μ n ( i + 1 ) { ln [   α 1 y n ( 1 α 1 ) 1 y n   ] } α 1 = n = 1 N μ n ( i + 1 ) [ y n ln α 1 + ( 1 y n ) ln ( 1 α 1 )   ] α 1 = n = 1 N μ n ( i + 1 ) ( y n α 1 1 y n 1 α 1 ) = n = 1 N μ n ( i + 1 ) y n α 1 α 1 ( 1 α 1 ) = 0 \begin{aligned}\frac{\partial Q(\theta,\theta^{(i)})}{\partial \alpha_{1}} &= \dfrac{\partial \left\{ \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)} \ln \left[\ \alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\ \right] \right\}}{\partial \alpha_{1}}\\ &= \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)}\dfrac{\partial \left\{ \ln \left[\ \alpha_{1}^{y_{n}}(1-\alpha_{1})^{1-y_{n}}\ \right] \right\}}{\partial \alpha_{1}}\\ &= \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)}\dfrac{\partial \left[ y_{n} \ln \alpha_{1}+(1-y_{n})\ln(1-\alpha_{1})\ \right]}{\partial \alpha_{1}}\\ &= \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)}\left(\dfrac{ y_{n}}{ \alpha_{1}}-\dfrac{1-y_{n}}{1-\alpha_{1}}\right)\\ &= \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)}\dfrac{ y_{n}-\alpha_{1}}{ \alpha_{1}(1-\alpha_{1})}=0\\ \end{aligned}

\qquad\qquad 也就是:  n = 1 N μ n ( i + 1 ) y n = α 1 n = 1 N μ n ( i + 1 ) \displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)} y_{n} = \alpha_{1}\displaystyle\sum_{n=1}^{N}\mu_{n}^{(i+1)}

\qquad\qquad 可得到:  p ( i + 1 ) = α 1 ( i + 1 ) = n = 1 N μ n ( i + 1 ) y n n = 1 N μ n ( i + 1 ) p^{(i+1)}=\alpha_{1}^{(i+1)} = \frac{\displaystyle\sum_{n=1}^{N} \mu_{n}^{(i+1)} y_{n}}{\displaystyle\sum_{n=1}^{N}\mu_{n}^{(i+1)}}

\qquad
\qquad   ② 对 q = α 2 q=\alpha_{2} 求偏导

\qquad\qquad   Q ( θ , θ ( i ) ) α 2 = { n = 1 N ( 1 μ n ( i + 1 ) ) ln [   α 2 y n ( 1 α 2 ) 1 y n   ] } α 2 = n = 1 N ( 1 μ n ( i + 1 ) ) { ln [   α 2 y n ( 1 α 2 ) 1 y n   ] } α 2 = n = 1 N ( 1 μ n ( i + 1 ) ) [ y n ln α 2 + ( 1 y n ) ln ( 1 α 2 )   ] α 2 = n = 1 N ( 1 μ n ( i + 1 ) ) ( y n α 2 1 y n 1 α 2 ) = n = 1 N ( 1 μ n ( i + 1 ) ) y n α 2 α 2 ( 1 α 2 ) = 0 \begin{aligned}\frac{\partial Q(\theta,\theta^{(i)})}{\partial \alpha_{2}} &= \dfrac{\partial \left\{ \displaystyle\sum_{n=1}^{N} (1-\mu_{n}^{(i+1)}) \ln \left[\ \alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\ \right] \right\}}{\partial \alpha_{2}}\\ &= \displaystyle\sum_{n=1}^{N} (1-\mu_{n}^{(i+1)})\dfrac{\partial \left\{ \ln \left[\ \alpha_{2}^{y_{n}}(1-\alpha_{2})^{1-y_{n}}\ \right] \right\}}{\partial \alpha_{2}}\\ &= \displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)})\dfrac{\partial \left[ y_{n} \ln \alpha_{2}+(1-y_{n})\ln(1-\alpha_{2})\ \right]}{\partial \alpha_{2}}\\ &= \displaystyle\sum_{n=1}^{N} (1-\mu_{n}^{(i+1)})\left(\dfrac{ y_{n}}{ \alpha_{2}}-\dfrac{1-y_{n}}{1-\alpha_{2}}\right)\\ &= \displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)})\dfrac{ y_{n}-\alpha_{2}}{ \alpha_{2}(1-\alpha_{2})}=0\\ \end{aligned}

\qquad\qquad 也就是:  n = 1 N ( 1 μ n ( i + 1 ) ) y n = α 2 n = 1 N ( 1 μ n ( i + 1 ) ) \displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)})y_{n} = \alpha_{2}\displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)})

\qquad\qquad 可得到:  q ( i + 1 ) = α 2 ( i + 1 ) = n = 1 N ( 1 μ n ( i + 1 ) ) y n n = 1 N ( 1 μ n ( i + 1 ) ) q^{(i+1)}=\alpha_{2}^{(i+1)} = \frac{\displaystyle\sum_{n=1}^{N} (1-\mu_{n}^{(i+1)}) y_{n}}{\displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)})}

\qquad
\qquad   ③ 考虑对 π 1 = π , π 2 = 1 π \pi_{1}=\pi,\pi_{2}=1-\pi ,为了使 Q ( θ , θ ( i ) ) = E P ( Z Y , θ ) [ ln P ( Y , Z θ )     Y , θ ( i ) ] Q(\theta,\theta^{(i)})=E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}] 达到最大,同时必须满足 k π k = 1 \sum\limits_{k}\pi_{k}=1 ,需采用拉格朗日乘子法进行求解:

\qquad\qquad   max   { n = 1 N k = 1 2 E P ( Z Y , θ ) [ z n k ] {   ln π k + ln [   α k y n ( 1 α k ) 1 y n   ]   } + λ ( k π k 1 ) } \max\ \left\{ \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln \left[\ \alpha_{k}^{y_{n}}(1-\alpha_{k})^{1-y_{n}}\ \right]\ \right\} +\lambda(\sum\limits_{k}\pi_{k}-1) \right\}

\qquad\qquad π k \pi_{k} 求偏导:

\qquad\qquad\qquad   n = 1 N E P ( Z Y , θ ) [ z n k ] π k + λ = 0 \displaystyle\sum_{n=1}^{N}\frac{E_{P(\bold Z|\bold Y,\theta)}[z_{nk}]}{\pi_{k}} +\lambda=0

\qquad\qquad 等式两端都乘以 π k \pi_{k}

\qquad\qquad\qquad   n = 1 N E P ( Z Y , θ ) [ z n k ] + π k λ = 0 \displaystyle\sum_{n=1}^{N}E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] +\pi_{k}\lambda=0

\qquad\qquad π k \pi_{k} 求和:

\qquad\qquad\qquad   k = 1 K n = 1 N E P ( Z Y , θ ) [ z n k ] + k = 1 K π k λ = 0 \displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}E_{P(\bold Z|\bold Y,\theta)}[z_{nk}] +\displaystyle\sum_{k=1}^{K}\pi_{k}\lambda=0

\qquad\qquad 由于

\qquad\qquad\qquad   E P ( Z Y , θ ) [ z n k ] = P ( y n z n k = 1 , θ ) P ( z n k = 1 θ ) j = 1 K P ( y n z n j = 1 , θ ) P ( z n j = 1 θ ) E_{P(\bold Z|\bold Y,\theta)}[z_{nk}]=\dfrac{P(y_{n}|z_{nk}=1,\theta)P(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}P(y_{n}|z_{nj}=1,\theta)P(z_{nj}=1|\theta)} , 此处 K = 2 K=2

\qquad\qquad 所以

\qquad\qquad\qquad   k = 1 K n = 1 N E P ( Z Y , θ ) [ z n k ] = k = 1 K n = 1 N P ( y n z n k = 1 , θ ) P ( z n k = 1 θ ) j = 1 K P ( y n z n j = 1 , θ ) P ( z n j = 1 θ ) = n = 1 N k = 1 K P ( y n z n k = 1 , θ ) P ( z n k = 1 θ ) j = 1 K P ( y n z n j = 1 , θ ) P ( z n j = 1 θ ) = N \begin{aligned}\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}E_{P(\bold Z|\bold Y,\theta)}[z_{nk}]&=\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\dfrac{P(y_{n}|z_{nk}=1,\theta)P(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}P(y_{n}|z_{nj}=1,\theta)P(z_{nj}=1|\theta)}\\ &=\dfrac{\sum\limits_{n=1}^{N}\sum\limits_{k=1}^{K}P(y_{n}|z_{nk}=1,\theta)P(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}P(y_{n}|z_{nj}=1,\theta)P(z_{nj}=1|\theta)} \\ &=N\\ \end{aligned}

\qquad\qquad 可得到:  λ = N \lambda=-N
\qquad\qquad       π 1 = 1 N n = 1 N E P ( Z Y , θ ) [ z n 1 ] = 1 N n = 1 N μ n ( i + 1 ) \pi_{1}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}E_{P(\bold Z|\bold Y,\theta)}[z_{n1}]=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}\mu_{n}^{(i+1)}
\qquad\qquad       π 2 = 1 N n = 1 N E P ( Z Y , θ ) [ z n 2 ] = 1 N n = 1 N ( 1 μ n ( i + 1 ) ) \pi_{2}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}E_{P(\bold Z|\bold Y,\theta)}[z_{n2}]=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}(1-\mu_{n}^{(i+1)})

\qquad\qquad 因此   π ( i + 1 ) = π 1 ( i + 1 ) = 1 N n = 1 N μ n ( i + 1 ) \pi^{(i+1)}=\pi_{1}^{(i+1)}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}\mu_{n}^{(i+1)}

\qquad

后记

\qquad 三硬币问题混合高斯模型本质上是一样的:

  • 混合高斯模型是“ K K 个高斯模型的混合”
    p ( x π , μ , Σ ) = k = 1 K π k N ( x μ k , Σ k )    , k = 1 K π k = 1 \qquad\qquad p(\mathbf{x}|\boldsymbol{\pi},\boldsymbol{\mu},\boldsymbol{\Sigma})=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbf{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \ ,\qquad\displaystyle\sum_{k=1}^{K}\pi_{k}=1
  • 三硬币问题实际上是“ 2 2 个伯努利分布的混合”,详细描述可以参考《Pattern Recognition and Machine Learning》第9.3.3节
    P ( y π , α ) = k = 1 2 π k P ( y α k )    ,   k = 1 2 π k = 1 ,   P ( y α k ) = α k y ( 1 α k ) ( 1 y ) \qquad\qquad\begin{aligned} P(y|\boldsymbol{\pi},\boldsymbol{\alpha}) &= \displaystyle\sum_{k=1}^{2}\pi_{k}P(y|\alpha_{k})\ \ ,\ \displaystyle\sum_{k=1}^{2}\pi_{k}=1,\ P(y|\alpha_{k})=\alpha_{k}^{y}(1-\alpha_{k})^{(1-y)} \end{aligned}
    \qquad

代码:(李航《统计学习方法》9.1.1)

import numpy as np

y = np.array([1,1,0,1,0,0,1,0,1,1])
N = len(y)
pi_n = 0.4
p_n = 0.6
q_n = 0.7
flag = 1
iter = 0
while flag:    
    pi = pi_n
    p = p_n
    q = q_n    
    mu = np.zeros(N)    
    i = 0
    for n in y:
        t1 = pi*np.power(p,n)*np.power(1-p,1-n)
        t2 = (1-pi)*np.power(q,n)*np.power(1-q,1-n)
        mu[i] = t1/(t1+t2)
        i = i + 1   
        
    pi_n = np.sum(mu)/N
    p_n  = np.sum(y*mu)/np.sum(mu)
    q_n  = np.sum(y*(1-mu))/np.sum(1-mu)    
    print(('%1.4f %5.4f %5.4f') % (pi_n,p_n,q_n))    
    iter = iter + 1    
    if iter==2:
        flag = 0

猜你喜欢

转载自blog.csdn.net/xfijun/article/details/103483863