EM Algorithm Derivation--Three Coin Model Derivation Process

This blog mainly introduces the case of the three-coin model involved in the EM algorithm explained in Li Hang's "Statistical Learning Methods (Second Edition)". The derivation process of the model is omitted in the original text. This blog is mainly about the specific derivation process of the model.

1 Three-coin model

  Suppose there are 3 coins, denoted as A, B, and C respectively. The probabilities of these coins appearing heads are π \pipp __p andqqq . Carry out the following coin tossing test: first toss coin A, choose coin B or coin C according to the result, choose coin B for heads, and choose coin C for tails;nnindependentlyn trials (heren = 10 n=10n=10 ), the observation results are as follows: 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,11,1,0,1,0,0,1,0,1,1 At present, only the result of the coin toss can be observed, but the process of the coin toss cannot be observed. The problem to be solved by the EM algorithm is how to estimate π \piin the absence of coin toss process informationpp __p andqqThe value of q .

2 Derivation process

  Suppose the observed data is recorded as Y = ( y 1 , y 2 , … , yn ) TY=(y_{1},y_{2},\dots,y_{n})^{T}Y=(y1,y2,,yn)T , whereyi y_{i}yiIndicates the iiThe observation result of the i trial is 1 or 0; the hidden variable data is recorded asZ = ( z 1 , z 2 , … , zn ) TZ=(z_{1},z_{2},\dots,z_{n}) ^{T}Z=(z1,z2,,zn)T , among whichzi z_{i}ziIndicates the iiThe outcome of i unobserved coin flips A. θ = ( π , p , q ) \theta=(\pi,p,q)i=( p ,p,q ) is recorded as a model parameter. For any observed variable yi y_{i}in an experimentyi,the function P ( yi ∣ θ ) = ∑ zi P ( yi , zi ∣ θ ) = ∑ zip ( zi ∣ θ ) P ( yi ∣ zi , θ ) = π pyi ( 1 − p ) 1 − yi + ( 1 − π ) qyi ( 1 − q ) 1 − yi \begin{aligned}P(y_{i}|\theta)&=\sum_{z_{i}}P(y_{i},z_{i}| \theta) =\sum_{z_{i}}p(z_{i}|\theta)P(y_{i}|z_{i},\theta)\\&=\pi p^{y_{i} }(1-p)^{1-y_{i}}+(1-\pi)q^{y_{i}}(1-q)^{1-y_{i}}\end{aligned}P ( andiθ)=ziP ( andi,ziθ)=zip(ziθ ) P ( andizi,i )=πpyi(1p)1yi+(1π ) qyi(1q)1yiTherefore, the observed data YYThe functional function of Y is P ( Y ∣ θ ) = ∑ ZP ( Z ∣ θ ) P ( Y ∣ Z , θ ) = ∏ i = 1 n [ π pyi ( 1 − pi ) 1 − yi + ( 1 − π ) qyi ( 1 − q ) 1 − yi ] \begin{aligned}P(Y|\theta)&=\sum_{Z}P(Z|\theta)P(Y|Z,\theta)\\&= \prod_{i=1}^{n}[\pi p^{y_{i}}(1-p_{i})^{1-y_{i}}+(1-\pi)q^{y_ {i}}(1-q)^{1-y_{i}}]\end{aligned}P(Yθ)=ZP(Zθ)P(YZ,i )=i=1n[πpyi(1pi)1yi+(1π ) qyi(1q)1yi]If the traditional maximum likelihood method is used to solve the parameter θ \thetaθ ,indefinitely θ ^ = arg max θ log P ( Y ∣ θ ) where \theta=arg\space means {\theta}{max}log P(Y|\theta)i^=arg imaxl o g P ( Y θ ) But this formula does not have an analytical solution, and this method does not work. At this time, the EM algorithm needs to be used. The EM algorithm is an iterative process, and ends when the termination condition is met. The following process mainly shows how to use the iiThe parameter result of the i time is derived from thei + 1 i+1i+1 parameter.

  • First assign initial values ​​to the model parameters. Here the initial value is θ ( 0 ) = ( π ( 0 ) , p ( 0 ) , q ( 0 ) ) \theta^{(0)}=(\pi^{(0)},p^{(0 )},q^{(0)})i(0)=( p(0),p(0),q(0))
  • Suppose we have now obtained the iiThe parameter value after i iterations, recorded as θ ( i ) = ( π ( i ) , p ( i ) , q ( i ) ) \theta^{(i)}=(\pi^{(i)}, p^{(i)},q^{(i)})i(i)=( p(i),p(i),q( i ) ). Now what we have to do is:θ ( i ) → θ ( i + 1 ) \theta^{(i)}\rightarrow \theta^{(i+1)}i(i)i(i+1)
  • root θ ( i ) \theta^{(i)}i( i ) Estimate the value of hidden data and calculateQQQ function (E step);
    what is to be estimated here is the hidden dataZZThe probability of Z occurring, that is, the probability of picking coin B or coin C in each trial. After gettingθ ( i ) \theta^{(i)}iAfter ( i ) , observation datayj y_{j}yjFrom Coin BBB的概率如下: u j ( i + 1 ) = P ( z j = B ∣ y j , θ ( i ) ) = π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j + ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j u_{j}^{(i+1)}=P(z_{j}=B|y_{j},\theta^{(i)})=\frac{\pi^{(i)}(p^{(i)})^{y_{j}}(1-p^{(i)})^{1-y_{j}}} {\pi^{(i)} (p^{(i)})^{y_{j}} (1-p^{(i)})^{1-y_{j}}+ (1-\pi^{(i)})(q^{(i)})^{y_{j}}(1-q^{(i)})^{1-y_{j}}} uj(i+1)=P(zj=Byj,i(i))=Pi(i)(p(i))yj(1p(i))1yj+(1Pi(i))(q(i))yj(1q(i))1yjPi(i)(p(i))yj(1p(i))1yjObservation data yj y_{j}yjThe probability from coin C is: 1 − uj ( i + 1 ) 1-u_{j}^{(i+1)}1uj(i+1). Calculate QQQ function (not talking aboutQQLet Q ( θ , θ ( i ) ) = EZ [ log P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ ZP ( Z ∣ Y , θ ( i ) ) log P ( Y , Z ∣ θ ) \begin{aligned}Q(\theta,\theta^{(i)})&=E_{Z}[log P(Y,Z|\ theta)|Y,\theta^{(i)}]\\&=\sum_{Z}P(Z|Y,\theta^{(i)})log P(Y,Z|\theta)\end {aligned}Q(θ,i(i))=EZ[logP(Y,Zθ)Y,i(i)]=ZP(ZY,i(i))logP(Y,Zθ)Here to introduce QQThe meaning of the Q function: the logarithmic likelihood function of complete datalog P ( Y , Z ∣ θ ) log P(Y,Z|\theta)logP(Y,Z θ ) in the given observation dataYYY and the current parameterθ ( i ) \theta^{(i)}i( i ) For unobserved dataZZThe conditional probability distribution of Z P ( Z ∣ Y , θ ( i ) ) P(Z|Y,\theta^{(i)})P(ZY,i( i ) )Specify the current valueθ ( i + 1 ) = arg max θ Q ( θ , θ ( i ) ) \theta^{(i+1)}=arg \space \underset {\theta} {max}Q(\theta,\theta^{(i)})i(i+1)=arg imaxQ(θ,i( i ) ).
    Here is a little more knowledge about conditional probability expectations. atX = x X = xX=random variableYY under xThe formula for calculating the expectation of Y is as follows: E ( Y ∣ X = x ) = { ∑ yj P ( Y = yj ∣ X = x ) (X,Y) is a two-dimensional discrete random variable ∫ − ∞ ∞ yp ( y ∣ x ) dy (X,Y) is a two-dimensional continuous random variable E(Y|X=x)=\begin{cases}\sum y_{j}P(Y=y_{j}|X=x) & \text { (X,Y) is a two-dimensional discrete random variable} \\ \int_{-\infty}^{\infty}yp(y|x)dy & \text{(X,Y) is a two-dimensional continuous random variable}\ end{cases}E(YX=x)={ yjP ( Y)=yjX=x)y p ( y x ) d y(X,Y) is a two-dimensional discrete random variable(X,Y) is a two-dimensional continuous random variableNext, in order to fully display Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)})Q(θ,i( i ) ), we first adjust the number of trials of the three-coin model to 2.
    Assume that the observation data obtained under the process of 2 random experiments isY = ( y 1 , y 2 ) Y=(y_{1},y_{2})Y=(y1,y2)。那么 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,i( i ) )represents every possible observation data setZZThe conditional probability of Z appearing and the likelihood function of complete datalog P ( Y , Z ∣ θ ) logP(Y,Z|\theta)logP(Y,Z θ ) is the sum of the products.
    Now takeZ = ( 1 , 0 ) Z=(1,0)Z=(1,0 ) , whenP ( Z ∣ Y , θ ( i ) ) = u 1 ( i + 1 ) ( 1 − u 2 ( i + 1 ) ) P(Z|Y,\theta^{(i)} )=u_{1}^{(i+1)}(1-u_{2}^{(i+1)})P(ZY,i(i))=u1(i+1)(1u2(i+1)) ,then log P ( Y , Z ∣ θ ) = log [ π py 1 ( 1 − p ) 1 − y 1 ( 1 − π ) qy 2 ( 1 − q ) 1 − y 2 ] = log π + y logp + ( 1 − y 1 ) log ( 1 − p ) + log ( 1 − π ) + y 2 logq + ( 1 − y 2 ) log ( 1 − q ) \begin{aligned}logP(Y,Z|\ theta)&=log[\pi p^{y_{1}}(1-p)^{1-y_{1}}(1-\pi)q^{y_{2}}(1-q)^ {1-y_{2}}]\\&=log\pi+y_{1}logp+(1-y_{1})log(1-p)+log(1-\pi)+y_{2}logq+ (1-y_{2})log(1-q)\end{aligned}logP(Y,Zθ)=log[πpy1(1p)1y1(1π ) qy2(1q)1y2]=l o g π+y1logp+(1y1)log(1p)+log(1p )+y2I 'm sorry _+(1y2)log(1q)Add Z = ( 1 , 1 ) Z=(1,1)Z=(1,1) Z = ( 0 , 1 ) Z=(0, 1) Z=(0,1) Z = ( 0 , 0 ) Z=(0,0) Z=(0,0 ) P ( Z ∣ Y , θ ( i ) ) P(Z|Y,\theta^{(i)})P(ZY,i( i ) )log P ( Y , Z ∣ θ ) logP(Y,Z|\theta)logP(Y,Z θ ) to get the correspondingQ ( θ , θ ( i ) ) Q(\theta,\theta^{(i)})Q(θ,i( i ) ). This information isQ 2 ( θ , θ ( i ) ) Q_{2}(\theta,\theta^{(i)})Q2( i ,i( i ) ), thenQ 2 ( θ ,θ ( i ) ) = u 1 i + 1 ( 1 − u 2 ( i + 1 ) ) ( log π + log ( 1 − π ) + y 1 logp + ( 1 − y 1 ) log ( 1 − p ) + y 2 logq + ( 1 − y 2 ) log ( 1 − q ) ) + u 1 ( i + 1 ) u 2 ( i + 1 ) ( log π + log π + y 1 logp + ( 1 − y 1 ) log ( 1 − p ) + y 2 logp + ( 1 − y 2 ) log ( 1 − p ) ) + ( 1 − u 1 ( i + 1 ) ) u 2 ( i + 1 ) ( log ( 1 − π ) + log π + y 1 logq + ( 1 − y 1 ) log ( 1 − q ) + y 2 logp + ( 1 − y 2 ) log ( 1 − p ) ) + ( 1 − u 1 ( i + 1 ) ) ( 1 − u 2 ( i + 1 ) ) ( log ( 1 − π ) + log ( 1 − π ) + y 1 logq + ( 1 − y 1 ) log ( 1 − q ) + y 2 logq + ( 1 − y 2 ) log ( 1 − q ) ) = ( u 1 ( i + 1 ) + u 2 ( i + 1 ) ) log π + ( 2 − u 1 ( i + 1 ) − u 2 ( i + 1 ) ) log ( 1 − π ) + u 1 ( i + 1 ) [ y 1 logp + ( 1 − y 1 ) log ( 1 − p ) ] + u 2 ( i + 1 ) [ y 2 logp + ( 1 − y 2 ) log ( 1 − p ) ] + (1 − u 1 ( i + 1 ) ) [ y 1 l o g q + ( 1 − y 1 ) l o g ( 1 − q ) + ( 1 − u 2 ( i + 1 ) ) [ y 2 l o g q + ( 1 − y 2 ) l o g ( 1 − q ) ] \begin{aligned}Q_{2}(\theta,\theta^{(i)})&=u_{1}^{i+1}(1-u_{2}^{(i+1)})(log\pi+log(1-\pi)+y_{1}logp+(1-y_{1})log(1-p)+y_{2}logq+(1-y_{2})log(1-q)) \\ &+u_{1}^{(i+1)}u_{2}^{(i+1)}(log\pi+log\pi+y_{1}logp+(1-y_{1})log(1-p)+y_{2}logp+(1-y_{2})log(1-p))\\&+(1-u_{1}^{(i+1)})u_{2}^{(i+1)}(log(1-\pi)+log\pi+y_{1}logq+(1-y_{1})log(1- q)+y_{2}logp+(1-y_{2})log(1-p))\\&+(1-u_{1}^{(i+1)})(1-u_{2}^{(i+1)})(log(1-\pi)+log(1-\pi)+y_{1}logq+(1-y_{1})log(1-q)+y_{2}logq+(1-y_{2})log(1-q)) \\&=(u_{1}^{(i+1)}+u_{2}^{(i+1)})log\pi+(2-u_{1}^{(i+1)}-u_{2}^{(i+1)})log(1-\pi) \\ &+u_{1}^{(i+1)}[y_{1}logp+(1-y_{1})log(1-p)]+u_{2}^{(i+1)}[y_{2}logp+(1-y_{2})log(1-p)]\\&+(1-u_{1}^{(i+1)})[y_{1}logq+(1-y_{1})log(1-q)+(1-u_{2}^{(i+1)})[y_{2}logq+(1-y_{2})log(1-q)]\end{aligned}Q2( i ,i(i))=u1i+1(1u2(i+1)) ( l o g π+log(1p )+y1logp+(1y1)log(1p)+y2I 'm sorry _+(1y2)log(1q))+u1(i+1)u2(i+1)( l o g π+l o g π+y1logp+(1y1)log(1p)+y2logp+(1y2)log(1p))+(1u1(i+1))u2(i+1)(log(1p )+l o g π+y1I 'm sorry _+(1y1)log(1q)+y2logp+(1y2)log(1p))+(1u1(i+1))(1u2(i+1))(log(1p )+log(1p )+y1I 'm sorry _+(1y1)log(1q)+y2I 'm sorry _+(1y2)log(1q))=(u1(i+1)+u2(i+1)) l o g π+(2u1(i+1)u2(i+1))log(1p )+u1(i+1)[y1logp+(1y1)log(1p)]+u2(i+1)[y2logp+(1y2)log(1p)]+(1u1(i+1))[y1I 'm sorry _+(1y1)log(1q)+(1u2(i+1))[y2I 'm sorry _+(1y2)log(1q)]Analyzing the above results, it can be inferred that the nnWhen n times of random trials, itsQQDetermine the value of Q : Q ( θ , θ ( i ) ) = ∑ j = 1 n [ uj ( i + 1 ) [ log π + yjlogp + ( 1 − yj ) log ( 1 − p ) ] + ( 1 − uj ( i + 1 ) ) [ log ( 1 − π ) + yjlogq + ( 1 − yj ) log ( 1 − q ) ] ] Q(\theta,\theta^{(i)})=\sum_{j= 1}^{n}[u_{j}^{(i+1)}[log\pi+y_{j}logp+(1-y_{j})log(1-p)]+(1-u_{ j}^{(i+1)})[log(1-\pi)+y_{j}logq+(1-y_{j})log(1-q)]Q(θ,i(i))=j=1n[uj(i+1)[ l o g π+yjlogp+(1yj)log(1p)]+(1uj(i+1))[log(1p )+yjI 'm sorry _+(1yj)log(1q)]]
  • Next, in order to get a new round of parameters θ ( i + 1 ) \theta^{(i+1)}i( i + 1 ) , demand usageQ ( θ , θ ( i ) ) Q(\theta,\theta^{(i)})Q(θ,i( i ) )function in turn forπ \piπ ,ppp andqqq seeks the partial derivative. The details are as follows:
    Forπ \piπ求偏密 get the following formula: ∂ Q ∂ π = ∑ j = 1 n [ uj ( i + 1 ) ⋅ 1 π − 1 − uj ( i + 1 ) 1 − π ] \frac{\partial{Q}} {\partial{\pi}}=\sum_{j=1}^{n}[u_{j}^{(i+1)} \cdot\frac{1}{\pi}-\frac{1- u_{j}^{(i+1)}}{1-\pi}]πQ=j=1n[uj(i+1)Pi11Pi1uj(i+1)] Let the calculation formula be 0, then get:π ( i + 1 ) = 1 n ⋅ ∑ j = 1 nuj ( i + 1 ) \pi^{(i+1)}=\frac{1}{n} \cdot\sum_{j=1}^{n}u_{j}^{(i+1)}Pi(i+1)=n1j=1nuj(i+1)to ppCalculate the partial derivative of p to get the following formula: ∂ Q ∂ p = ∑ j = 1 nuj ( i + 1 ) ( yjp + yj − 1 1 − p ) \frac{\partial{Q}}{\partial{p}}= \sum_{j=1}^{n}u_{j}^{(i+1)}(\frac{y_{j}}{p}+\frac{y_{j}-1}{1-p })pQ=j=1nuj(i+1)(pyj+1pyj1) Let the calculation formula be 0, get:p ( i + 1 ) = ∑ j = 1 nuj ( i + 1 ) ⋅ yj ∑ j = 1 nuj ( i + 1 ) p^{(i+1)}=\ frac{\sum_{j=1}^{n}u_{j}^{(i+1)}\cdot y_{j}}{\sum_{j=1}^{n}u_{j}^{ (i+1)}}p(i+1)=j=1nuj(i+1)j=1nuj(i+1)yjto qqCalculate the partial derivative of q to get the following formula: ∂ Q ∂ q = ∑ j = 1 n ( 1 − uj ( i + 1 ) ) ( yjq + yj − 1 1 − q ) \frac{\partial{Q}}{\partial {q}}=\sum_{j=1}^{n}(1-u_{j}^{(i+1)})(\frac{y_{j}}{q}+\frac{y_{ j}-1}{1-q})qQ=j=1n(1uj(i+1))(qyj+1qyj1) the same, possible and attainable:q ( i + 1 ) = ∑ j = 1 n ( 1 − uj ( i + 1 ) ) yj ∑ j = 1 n ( 1 − uj ( i + 1 ) ) q^{(i +1)}=\frac{\sum_{j=1}^{n}(1-u_{j}^{(i+1)})y_{j}}{\sum_{j=1}^{ n}(1-u_{j}^{(i+1)})}q(i+1)=j=1n(1uj(i+1))j=1n(1uj(i+1))yj

So far, the derivation of the EM algorithm of the three-coin model is completed.

References

  1. https://blog.csdn.net/weixin_41566471/article/details/106219019
  2. "Statistical Learning Methods"

Guess you like

Origin blog.csdn.net/yeshang_lady/article/details/132151771