EM algorithm to solve three-coin model parameter derivation

EM algorithm

The EM algorithm is an iterative algorithm used for parameter estimation of probabilistic models containing latent variables. The algorithm alternates between two steps: E-step (Expectation Step, expectation step) and M-step (Maximization Step, maximization step). In step E, the current parameter estimates are used to calculate the posterior probability of the hidden variable; in step M, the posterior probability of the hidden variable calculated in step E is used to maximize the log-likelihood function of the complete data to obtain a new parameter estimates. These two steps are iterated until convergence.

EM algorithm general process

The EM algorithm finds L ( θ ) = log ⁡ P ( Y ∣ θ ) L(\theta)=\log P(Y \mid \theta) through iterationL ( i )=logP ( YMaximum likelihood estimate of θ ) . Each iteration contains two steps,EEStep E , find expectation,MMM step, seek to maximize.

Input: observed variable data YYY , hidden variable dataZZZ , joint distributionP ( Y , Z ∣ θ ) P(Y, Z \mid \theta)P ( Y ,Zθ ) , constantP ( Z ∣ Y , θ ) P(Z \mid Y, \theta)P(ZY,θ ) ;
Target output: model parameterθ \thetai .

The following are the iteration steps:
(1) Select the initial value of the parameter θ ( i ) \theta^{(i)}i( i ) , start iteration;
(2)E \mathrm{E}E -step: 记θ ( i ) \theta^{(i)}i( i ) Partiii iteration parameterθ \thetaThe estimated value of θ , at thei + 1 i+1i+E \mathrm{E} for 1 iterationE步, the function
Q ( θ , θ ( i ) ) = EZ [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z log ⁡ P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) \begin{aligned} Q\left(\theta,\theta^{(i)}\right) & =E_Z\left[\log P(Y,Z\mid\theta)\ mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) P\left(Z \mid Y, \theta^{(i)}\ right) \end{aligned}Q( i ,i(i))=EZ[logP ( Y ,Zi )Y,i(i)]=ZlogP ( Y ,Zi ) P(ZY,i(i))

Specifically, P ( Z ∣ Y , θ ( i ) ) P\left(Z \mid Y, \theta^{(i)}\right)P(ZY,i( i ) )is the given observation dataYYY and the current parameter estimateθ ( i ) \theta^{(i)}i( i ) Lower hidden variable dataZZConditional probability distribution of Z ;
(3)M \mathrm{M}M 步: 求使 Q ( θ , θ ( i ) ) Q\left(\theta, \theta^{(i)}\right) Q( i ,i( i ) Maximumθ \thetaθ , determine thei + 1 i+1i+The estimated value of the parameter for 1 iterationθ ( i + 1 ) \theta^{(i+1)}i(i+1)
θ ( i + 1 ) = arg ⁡ max ⁡ θ Q ( θ , θ ( i ) ) \theta^{(i+1)}=\arg \max _\theta Q\left(\theta, \theta^{(i)}\right) i(i+1)=argimaxQ( i ,i( i ) )
(4) Usefulθ ( i + 1 ) \theta^{(i+1)}i( i + 1 ) is brought in, and steps (2) and (3) are repeated until convergence.

three coin model

Insert image description here

First, I will introduce an example of using the EM algorithm (quoted from Professor Li Hang's "Statistical Learning Methods (2nd Edition, page 175)"). The book does not describe how the parameters of this example are obtained, so I have updated it here. A basic and detailed addition.

Example 9.1 ( Three-Coin Model ) Suppose there are 3 coins, denoted A, B, and C respectively. The probabilities of these coins appearing heads are π, p \pi, p respectivelyPi ,p andqqq . Conduct the following coin tossing test: First toss coinA \mathrm{A}A , select coinB according to its result \mathrm{B}B or coinCCC , choose coinBBB , choose the coinCCC ; Then toss the selected coin. The result of tossing the coin is recorded as 1 if it is heads and as 0 if it is tails; Repeatnnn trials (here,n = 10 n=10n=10 ), the observation results are as follows:
1, 1, 0, 1, 0, 0, 1, 0, 1, 1 1,1,0,1,0,0,1,0,1,11,1,0,1,0,0,1,0,1,1

Suppose we can only observe the result of coin tossing, but not the process of coin tossing, that is, we do not know the heads and tails of coin A in an experiment. Ask how to estimate the probability of heads of three coins based on a batch of experimental data, that is, the parameters of the three-coin model.

Express this model using mathematical formulas

P ( y ∣ θ ) = ∑ z P ( y , z ∣ θ ) = ∑ z P ( z ∣ θ ) P ( y ∣ z , θ ) \begin{aligned} P(y\mid\theta) =\sum_z P(y,z\mid\theta)\=\sum_z P(z\mid\theta) P(y\midz,\theta)\\end{aligned}P ( andi )=zP ( and ,zi )=zP(zi ) P ( yz,i ).

{ p ( y = 1 ∣ θ ) = π p + ( 1 − π ) q , solver p ( y = 0 ∣ θ ) = π ( 1 − p ) + ( 1 − π ) ( 1 − q ) , Let p ( y ∣ θ ) = π py ( 1 − p ) 1 − y + ( 1 − π ) yqy ( 1 − q ) 1 − y , y ∈ { 0 , 1 } \begin{aligned} & \left\{\begin{array}{l} p(y=1 \mid \theta)=\pi p+(1-\pi) q, \text { embedded range } \\ p(y=0 \mid \theta )=\pi ( 1 - p ) + ( 1 - \ pi ) ( 1 - q ) , \ text { enclosure field } \ end { array } \ right . \\ & given p(y \mid \theta)=\pi p^y(1-p)^{1-y}+(1-\pi)^yq^y(1-q)^{1-y }, and \in\{0.1\}\end{aligned}{ p ( and=1i )=πp+(1π ) q , Observed the front p ( and=0i )=p ( 1p)+(1p ) ( 1q), observed  the oppositeSo p ( yi )=πpy(1p)1y+(1Pi )yqy(1q)1y,y{ 0,1}

In the above model yyy indicates that the final observation result of an experiment is 1 or 0, and the random variablezzz is a latent variable representing the unobserved coinAAThe result of A is heads or tails; θ = ( π , p , q ) \theta=(\pi,p,q)i=( Pi ,p,q ) is the parameter of this model.

We write the observation data YYP ( Y ∣ θ ) = ∑ ZP ( Z ∣ θ ) P ( Y ∣ Z , θ ) P(Y \mid \theta )
=\sum_Z P( Z \mid \theta) P(Y \ mid Z, \theta)P ( Yi )=ZP(Zi ) P ( YZ,i )

P ( Y ∣ θ ) = ∏ j = 1 n [ π pyj ( 1 − p ) 1 − yj + ( 1 − π ) qyj ( 1 − q ) 1 − yj ] P(Y \mid \theta)=\prod_ {j=1}^n\left[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi) q^{y_j}(1-q)^{1-y_j }\right]P ( Yi )=j=1n[πpyj(1p)1yj+(1π ) qyj(1q)1yj]

Our goal is to find the model parameters θ = (π, p, q) \theta=(\pi,p,q)i=( Pi ,p,q ) Standard value
θ ^ = arg max ⁡ θ [ log ⁡ P ( Y ∣ θ ) ] \hat{\theta}=\argmax_\theta [\log P(Y \mid \theta)]i^=iargmax[logP ( Yi )]

To solve this problem, we use EM EMEM algorithm solves the above three-coin model

Review conditional expectations knowledge points

Conditional expectation refers to the calculation of the expectation of a random variable given certain conditions. Specifically, let XXXYYY are two random variables,YYThe value range of Y is y 1 , y 2 , … , yn y_1,y_2,\ldots,y_ny1,y2,,yn, then given YYUnder the condition of Y , XXConditional expectation of X E ( X ∣ Y ) E(X \mid Y)E ( XY ) is defined as:

E ( X ∣ Y ) = ∑ i = 1 n X i P ( X i ∣ Y ) E(X \mid Y) = \sum_{i=1}^n X_i P(X_i \mid Y) E ( XY)=i=1nXiP(XiY)

Among them, X i X_iXimeans XXXY = yi Y=y_iY=yiThe value when , P ( X i ∣ Y ) P(X_i \mid Y)P(XiY ) expressed inY = yi Y=y_iY=yiUnder the conditions, XXX takeX i X_iXiThe probability. The meaning of this formula is that for each yi y_iyi,计算在Y = yi Y=y_iY=yiUnder the conditions, XXThe expected value of X , then for alliiSum i and get XXX at givenYYExpected value under the conditions of Y.

EM algorithm

Here we focus on dismantling the formulas of the E step and the M step. The parameters θ \theta obtained after the M step areθ can proceed to the next round of iteration

E step

We introduce QQ in the above processQ function (I won’t prove how it came from here, let’s use it first. For the detailed process of deriving the Q function, please see the reference article at the end of the article)

Q ( θ , θ ( i ) ) = EZ [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z log ⁡ P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) \begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\log P(Y, Z \mid \theta) \mid Y, \ theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) P\left(Z\mid Y, \theta^{(i)}\right) \end {aligned}Q( i ,i(i))=EZ[logP ( Y ,Zi )Y,i(i)]=ZlogP ( Y ,Zi ) P(ZY,i(i))

Prototype P ( Z ∣ Y , θ ( i ) ) P\left(Z \mid Y, \theta^{(i)}\right)P(ZY,i( i ) )According to differentzzz disassembly
for wherez = 1 z=1z=In the case of 1 , use Bayesian formula to deduce (AAA is the front, remember the hidden variablez = 1 z=1z=1 , meansyyy observations are fromBBForm B )
P ( z = 1 ∣ y , θ ( i ) ) = P ( z = 1 , y , θ ( i ) ) P ( y , θ ( i ) ) = P ( z = 1 ) P ( y , θ ( i ) ∣ z = 1 ) ∑ z P ( z ) P ( y , θ ( i ) ∣ z ) = P ( z = 1 ) p ( y , θ ( i ) ∣ z = 1 ) P ( z = 1 ) p ( y , θ ( i ) ∣ z = 1 ) + P ( z = 0 ) p ( y , θ ( i ) ∣ z = 0 ) = π py ( 1 − p ) 1 − y π py ( − p ) 1 − y + ( 1 − π ) qy ( 1 − q ) 1 − y \begin{aligned} P(z=1 \mid y, \theta^{(i)})=\frac{P( z=1, y, \theta^{(i)})}{P(y,\theta^{(i)})} & =\frac{P(z=1) P(y, \theta^{ (i)} \mid z=1)}{\sum_z P(z) P(y, \theta^{(i)} \mid z)} \\ & =\frac{P(z=1) p( y, \theta^{(i)}\mid z=1)}{P(z=1) p(y, \theta^{(i)}\mid z=1)+P(z=0) p (y, \theta^{(i)} \mid z=0)} \\ & =\frac{\pi p^y(1-p)^{1-y}}{\pi p^y(1 -p)^{1-y}+(1-\pi) q^y(1-q)^{1-y}} \end{aligned}P(z=1y,i(i))=P ( and ,i(i))P(z=1,y,i(i))=zP(z)P(y,i(i)z)P(z=1 ) P ( y ,i(i)z=1)=P(z=1 ) p ( y ,i(i)z=1)+P(z=0 ) p ( y ,i(i)z=0)P(z=1 ) p ( y ,i(i)z=1)=πpy(1p)1y+(1π ) qy(1q)1yπpy(1p)1y
In the same way, z = 0 z=0z=0
P ( z = 0 ∣ y , θ ( i ) ) = P ( z = 0 , y , θ ( i ) ) P ( y , θ ( i ) ) = P ( z = 0 ) P ( y , θ (i) ∣ z = 0 ) ∑ z P ( z ) P ( y , θ ( i ) ∣ z ) = P ( z = 0 ) p ( y , θ ( i ) ∣ z = 0 ) P ( z = ) p ( y , θ ( i ) ∣ z = 1 ) + P ( z = 0 ) p ( y , θ ( i ) ∣ z = 0 ) = ( 1 − π ) qy ( 1 − q ) 1 − y π py ( 1 − p ) 1 − y + ( 1 − π ) qy ( 1 − q ) 1 − y \begin{aligned} P(z=0 \mid y, \theta^{(i)})=\frac {P(z=0, y, \theta^{(i)})}{P(y,\theta^{(i)})} & =\frac{P(z=0) P(y, \ theta^{(i)}\mid z=0)}{\sum_z P(z) P(y, \theta^{(i)}\mid z)} \\ & =\frac{P(z=0 ) p(y, \theta^{(i)}\mid z=0)}{P(z=1) p(y, \theta^{(i)}\mid z=1)+P(z= 0) p(y, \theta^{(i)} \mid z=0)} \\ & =\frac{(1-\pi) q^y(1-q)^{1-y}}{ \pi p^y(1-p)^{1-y}+(1-\pi) q^y(1-q)^{1-y}} \end{aligned}P(z=0y,i(i))=P ( and ,i(i))P(z=0,y,i(i))=zP(z)P(y,i(i)z)P(z=0 ) P ( y ,i(i)z=0)=P(z=1 ) p ( y ,i(i)z=1)+P(z=0 ) p ( y ,i(i)z=0)P(z=0 ) p ( y ,i(i)z=0)=πpy(1p)1y+(1π ) qy(1q)1y(1π ) qy(1q)1y
Definition μ ( i ) = p ( z = 1 ∣ y , θ ( i ) ) \mu ^{(i)}=p(z=1 \mid y, \theta^{(i)})m(i)=p(z=1y,i( i ) )represents the probability that the y observation result comes from coin B (A is positive), then the probability that the coin comes from C (A result is negative) is1 − μ ( i ) 1-\mu ^{(i)}1m( i )
At this time,Q ( θ , θ ( i ) ) Q\left(\theta, \theta^{(i)}\right)Q( i ,i( i ) )functional function
Q ( θ , θ ( i ) ) = EZ [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z log ⁡ P ( Y , Z ∣ θ ) . P ( Z ∣ Y , θ ( i ) ) = log ⁡ P ( Y , Z = 1 ∣ θ ) P ( Z = 1 ∣ Y , θ ( i ) ) + log ⁡ P ( Y , Z = 0 ∣ θ ) P ( Z = 0 ∣ Y , θ ( i ) ) = log ⁡ P ( Z = 1 ∣ θ ) P ( Y ∣ Z = 1 , θ ) ⋅ μ ( i ) + log ⁡ P ( Z = 0 ∣ θ ) P ( Y ∣ Z = 0 , θ ) ⋅ ( 1 − μ ( i ) ) = μ ( i ) ⋅ log ⁡ [ π py ( 1 − p ) 1 − y ] + ( 1 − μ ( i ) ) ⋅ log ⁡ [ ( 1 − π ) qy ( 1 − q ) 1 − y ] = μ ( i ) [ log ⁡ π + log ⁡ py ( 1 − p ) 1 − y ] + ( 1 − μ ( i ) ) ⋅ log ⁡ ( 1 − π ) + log ⁡ qy ( 1 − q ) 1 − y ] \begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\; log P(Y,Z\mid\theta)\mid Y,\theta^{(i)}\right]\\&=\sum_Z\log P(Y,Z\mid\theta) P\left(Z\ mid Y, \theta^{(i)}\right) \\ & =\log P(Y, Z=1 \mid \theta) P\left(Z=1 \mid Y, \theta^{(i); }\right)+\log P(Y, Z=0 \mid \theta) P\left(Z=0 \mid Y, \theta^{(i)}\right) \\ & =\log P(Z =1 \mid \theta) P(Y \mid Z=1, \theta) \cdot \mu^{(i)}+\log P(Z=0 \mid \theta) P(Y \mid Z=0 , \theta) \cdot\left(1-\mu^{(i)}\right) \\ & =\mu^{(i)} \cdot \log \left[\pi p^y(1-p )^{1-y}\right]+\left(1-\mu^{(i)}\right) \cdot \log \left[(1-\pi) q^y(1-q)^{ 1-y}\right] \\ & =\mu^{(i)}\left[\log \pi+\log p^y(1-p)^{1-y}\right]+\left(1). -\mu^{(i)}\right) \cdot\left[\log(1-\pi)+\log q^y(1-q)^{1-y}\right] \end{aligned}Q( i ,i(i))=EZ[logP ( Y ,Zi )Y,i(i)]=ZlogP ( Y ,Zi ) P(ZY,i(i))=logP ( Y ,Z=1i ) P(Z=1Y,i(i))+logP ( Y ,Z=0i ) P(Z=0Y,i(i))=logP(Z=1i ) P ( YZ=1,i )m(i)+logP(Z=0i ) P ( YZ=0,i )(1m(i))=m(i)log[πpy(1p)1y]+(1m(i))log[(1π ) qy(1q)1y]=m(i)[logPi+logpy(1p)1y]+(1m(i))[log(1Pi )+logqy(1q)1y]

M step

求使 Q ( θ , θ ( i ) ) Q\left(\theta, \theta^{(i)}\right) Q( i ,i( i ) Maximumθ \thetaθ , determine theiithThe estimated value of the parameter for i iterationθ ( i + 1 ) \theta^{(i+1)}i( i + 1 ) . Let there be a total ofnnn group of observation values​​yj, 1 ≤ j ≤ n y_j, 1 \leq j \leq nyj,1jn , thenQQThe Q function can be expressed asQ ′ = ∑ jn Q ( θ , θ ( i ) ) Q^\prime=\sum_j^n Q\left(\theta, \theta^{(i)}\right)Q=jnQ( i ,i( i ) ). FindQ ′ Q^\primeQ Maximizing parametersθ = ( π , p , q ) \theta=(\pi, p, q)i=( Pi ,p,q ) , we letQ ′ Q^\primeQThe partial derivative of is 0 00 , respectively find the parametersθ = (π, p, q) \theta=(\pi, p, q)i=( Pi ,p,q)

The following shows how to solve for π \piDefinitions :
∂ Q ′ ∂ π = ∑ jn ∂ Q ∂ π = ∑ jn ∂ ∂ π [ μ j ( i ) [ log ⁡ π + log ⁡ pyj ( 1 − p ) 1 − yj ] + ( 1 − μ j ( i ) ) ⋅ [ log ⁡ ( 1 − π ) + log ⁡ qyj ( 1 − q ) 1 − yj ] ] = ∑ jn ∂ ∂ π [ μ j ( i ) log ⁡ π + ( 1 − μ j ( i ) ) log ⁡ ( 1 − π ) ] = ∑ jn μ j ( i ) π + μ j ( i ) − 1 1 − π = 0 = ∑ jn ( 1 − π ) μ j ( i ) + π ( μ j ( i ) − 1 ) = 0 ∑ jn µ j ( i ) = ∑ jn π \begin{aligned} \frac{\partial Q^\prime}{\partial \pi} =\sum_j^n \frac{\ partial Q}{ \partial \pi} &= \sum_j^n \frac{\partial}{\partial \pi} \left[\mu_j^{(i)}\left[\log \pi+\log p^{ y_j}(1-p)^{1-{y_j}}\right]+\left(1-\mu_j^{(i)}\right) \cdot \left[\log(1-\pi)+\ log q^{y_j}(1-q)^{1-{y_j}}\right]\right] \\ &= \sum_j^n \frac{\partial}{\partial \pi}\left[\mu_j ^{(i)} \log \pi+(1-\in_j^{(i)}) \log (1-\in)\right] \\ &= \sum_j^n \frac{\in_j^{(i). )}}{\pi}+\frac{\mu_j^{(i)}-1}{1-\pi}=0 \\ &= \sum_j^n(1-\pi) \mu_j^{(i )}+\pi\left(\mu_j^{(i)}-1\right)=0 \\ & \sum_j^n \mu_j^{(i)}=\sum_j^n \pi \end{aligned}πQ=jnπQ=jnπ[ mj(i)[logPi+logpyj(1p)1yj]+(1mj(i))[log(1Pi )+logqyj(1q)1yj]]=jnπ[ mj(i)logPi+(1mj(i))log(1Pi ) ]=jnPimj(i)+1Pimj(i)1=0=jn(1p ) mj(i)+Pi( mj(i)1)=0jnmj(i)=jnp
∑ jn μ j ( i ) = n π \sum_j^n \mu_j^{(i)}=n \pijnmj(i)=π = 1 n ∑ jn μ j ( i ) \pi=\frac{1}{n} \sum_j^n \mu_j^{(i)}Pi=n1jnmj(i)
In the same steps, we respectively let ∑ jn ∂ Q ∂ p = 0 \sum_j^n \frac{\partial Q}{\partial p}=0jnpQ=0 ∑ j n ∂ Q ∂ q = 0 \sum_j^n \frac{\partial Q}{\partial q}=0 jnqQ=0 , you can find all the parameters of the next iterationθ = (π, p, q) \theta=(\pi, p, q)i=( Pi ,p,q ) can proceed to the next round of iterative optimization.

 令  ∑ j n ∂ Q ∂ p = 0 = ∑ j n ∂ ∂ p [ μ j ( i ) [ log ⁡ π + log ⁡ p y j ( 1 − p ) 1 − y j ] + ( 1 − μ j ( i ) ) ⋅ [ log ⁡ ( 1 − π ) + log ⁡ q y j ( 1 − q ) 1 − y j ] ] = ∑ j n ∂ ∂ p [ μ j ( i ) ⋅ log ⁡ p y j ( 1 − p ) ( 1 − y j ) ] = ∑ j n ∂ ∂ p [ y j ⋅ μ j ( i ) log ⁡ p + μ j ( i ) ( 1 − y j ) log ⁡ ( 1 − p ) ] = ∑ j n [ y j ⋅ μ j ( i ) p + μ j ( i ) ⋅ ( 1 − y j ) ⋅ ( − 1 ) 1 − p ] = 0 = ∑ j n y j ⋅ μ j ( i ) − μ j ( i ) ⋅ y j ⋅ p + p ⋅ y j ⋅ μ j ( i ) − p ⋅ μ j ( i ) = 0 \begin{aligned} & \text { 令 } \sum_j^n \frac{\partial Q}{\partial p}=0 \\ & =\sum_j^n \frac{\partial}{\partial p}\left[\mu_j^{(i)}\left[\log \pi+\log p^{y_j}(1-p)^{1-{y_j}}\right]+\left(1-\mu_j^{(i)}\right) \cdot\left[\log (1-\pi)+\log q^{y_j}(1-q)^{1-{y_j}}\right]\right] \\ & =\sum_j^n \frac{\partial}{\partial p}\left[\mu_j^{(i)} \cdot \log p^{y_j}(1-p)^{(1-{y_j})}\right] \\ & =\sum_j^n \frac{\partial}{\partial p}[{y_j} \cdot \mu_j^{(i)} \log p+\mu_j^{(i)} (1-{y_j}) \log (1-p)] \\ & =\sum_j^n\left[\frac{ {y_j} \cdot \mu_j^{(i)} }{p}+\frac{\mu_j^{(i)} \cdot(1-{y_j}) \cdot(-1)}{1-p}\right]=0 \quad \\ & =\sum_j^n {y_j} \cdot \mu_j^{(i)}-\mu_j^{(i)} \cdot {y_j} \cdot p+p \cdot {y_j} \cdot \mu_j^{(i)}-p \cdot \mu_j^{(i)}=0 \quad \\ \end{aligned}  make jnpQ=0=jnp[ mj(i)[logPi+logpyj(1p)1yj]+(1mj(i))[log(1Pi )+logqyj(1q)1yj]]=jnp[ mj(i)logpyj(1p)(1yj)]=jnp[yjmj(i)logp+mj(i)(1yj)log(1p)]=jn[pyjmj(i)+1pmj(i)(1yj)(1)]=0=jnyjmj(i)mj(i)yjp+pyjmj(i)pmj(i)=0

∑ jnyj ⋅ μ j ( i ) = ∑ jnp ⋅ μ j ( i ) p = ∑ jnyj ⋅ μ j ( i ) ∑ jn μ j ( i ) \sum_j^n y_j \cdot \mu_j^{(i)} =\sum_j^np \cdot \mu_j^{(i)} \\ p=\frac{\sum_j^n y_j \cdot \mu_j^{(i)}}{\sum_j^n \mu_j^{(i) }} \\jnyjmj(i)=jnpmj(i)p=jnmj(i)jnyjmj(i)

接下来
 令  ∑ j n ∂ Q q = 0 = ∑ j n ∂ ∂ q [ μ j ( i ) [ log ⁡ π + log ⁡ p y j ( 1 − p ) 1 − y j ] + ( 1 − μ j ( i ) ) ⋅ [ log ⁡ ( 1 − π ) + log ⁡ q y j ( 1 − q ) 1 − y j ] ] = ∑ j n ∂ ∂ q [ ( 1 − μ j ( i ) ) ⋅ log ⁡ q y j + ( 1 − μ j ( i ) ) ⋅ ( 1 − y j ) ⋅ log ⁡ ( 1 − q ) ] = ∑ j n ( 1 − μ j ( i ) ) ⋅ y j q + ( 1 − μ j ( i ) ) ⋅ ( 1 − y j ) ⋅ ( − 1 ) 1 − q = ∑ j n ( y j ⋅ ( 1 − q ) − μ j ( i ) ⋅ y j ( 1 − q ) + q ⋅ ( y j − 1 ) − q ⋅ μ j ( i ) ⋅ ( y j − 1 ) ) = ∑ j n ( y j − y j ⋅ q − μ j ( i ) ⋅ y j + μ j ( i ) ⋅ y j ⋅ q + y j q − q − μ j ( i ) ⋅ y j ⋅ q + μ j ( i ) ⋅ q ) = ∑ j n ( y j − μ j ( i ) ⋅ y j − q + μ j ( i ) ⋅ q ) \begin{aligned} & \text { 令 } \sum_j^n \frac{\partial Q}{q}=0 \\ & =\sum_j^n \frac{\partial}{\partial q}\left[\mu_j^{(i)}\left[\log \pi+\log p^{y_j}(1-p)^{1-y_j}\right]+\left(1-\mu_j^{(i)}\right) \cdot\left[\log (1-\pi)+\log q^{y_j}(1-q)^{1-y_j}\right]\right] \\ & =\sum_j^n \frac{\partial}{\partial q}\left[\left(1-\mu_j^{(i)}\right) \cdot \log q^{y_j}+\left(1-\mu_j^{(i)}\right) \cdot\left(1-y_j\right) \cdot \log (1-q)\right] \\ & =\sum_j^n \frac{\left(1-\mu_j^{(i)}\right) \cdot y_j}{q}+\frac{\left(1-\mu_j^{(i)}\right) \cdot\left(1-y_j\right) \cdot(-1)}{1-q} \\ & =\sum_j^n\left(y_j \cdot(1-q)-\mu_j^{(i)} \cdot y_j(1-q)+q \cdot\left(y_j-1\right)-q \cdot \mu_j^{(i)} \cdot\left(y_j-1\right)\right) \\ & =\sum_j^n\left(y_j-y_j \cdot q-\mu_j^{(i)} \cdot y_j+\mu_j^{(i)} \cdot y_j \cdot q+y_j q-q-\mu_j^{(i)} \cdot y_j \cdot q+\mu_j^{(i)} \cdot q\right) \\ & =\sum_j^n\left(y_j-\mu_j^{(i)} \cdot y_j-q+\mu_j^{(i)} \cdot q\right) \\ & \end{aligned}  make jnqQ=0=jnq[ mj(i)[logPi+logpyj(1p)1yj]+(1mj(i))[log(1Pi )+logqyj(1q)1yj]]=jnq[(1mj(i))logqyj+(1mj(i))(1yj)log(1q)]=jnq(1mj(i))yj+1q(1mj(i))(1yj)(1)=jn(yj(1q)mj(i)yj(1q)+q(yj1)qmj(i)(yj1))=jn(yjyjqmj(i)yj+mj(i)yjq+yjqqmj(i)yjq+mj(i)q)=jn(yjmj(i)yjq+mj(i)q)

∑ j n y j − μ j ( i ) ⋅ y j = ∑ j n q ( 1 − μ j ( i ) ) q = ∑ j n ( 1 − μ j ( i ) ) ⋅ y j ∑ j n ( 1 − μ j ( i ) ) \begin{aligned} \sum_j^n y_j-\mu_j^{(i)} \cdot y_j=\sum_j^n q\left(1-\mu_j^{(i)}\right) \\ q=\frac{\sum_j^n\left(1-\mu_j^{(i)}\right) \cdot y_j}{\sum_j^n\left(1-\mu_j^{(i)}\right)} \end{aligned} jnyjmj(i)yj=jnq(1mj(i))q=jn(1mj(i))jn(1mj(i))yj

So far, we have obtained the parameters θ = ( π , p , q ) \theta=(\pi, p, q)i=( Pi ,p,q)分别为
π = 1 n ∑ j n μ j ( i ) p = ∑ j n y j ⋅ μ j ( i ) ∑ j n μ j ( i ) q = ∑ j n ( 1 − μ j ( i ) ) ⋅ y j ∑ j n ( 1 − μ j ( i ) ) \pi=\frac{1}{n} \sum_j^n \mu_j^{(i)} \\ p=\frac{\sum_j^n y_j \cdot \mu_j^{(i)}}{\sum_j^n \mu_j^{(i)}} \\ q=\frac{\sum_j^n\left(1-\mu_j^{(i)}\right) \cdot y_j}{\sum_j^n\left(1-\mu_j^{(i)}\right)}\\ Pi=n1jnmj(i)p=jnmj(i)jnyjmj(i)q=jn(1mj(i))jn(1mj(i))yj

With the iteration relationship of the above parameters and samples
1, 1, 0, 1, 0, 0, 1, 0, 1, 1 1,1,0,1,0,0,1,0,1,11,1,0,1,0,0,1,0,1,1We
set the initial parameters
π ( 0 ) = 0.5 , p ( 0 ) = 0.5 , q ( 0 ) = 0.5 \pi^{(0)}=0.5,p^{(0)}=0.5,q^{( 0)}=0.5Pi(0)=0.5,p(0)=0.5,q(0)=0.5
μ j ( i ) \mu_{j}^{(i)}mj(i)The expression gets y_j=0 regardless of yj = 0yj=0 oryj = 1 y_j=1yj=1,均有μ j ( 1 ) = 0.5 \mu_{j}^{(1)}=0.5mj(1)=0.5individual
valuesπ
( 1 ) = 1 10 ( µ 1 ( 0 ) + µ 2 ( 0 ) + µ 3 ( 0 ) ⋯ + µ 10 ( 0 ) ) = 0.5 p ( 1 ) = y 1 ⋅ µ 1 ( 0 ) + y 2 ⋅ μ 2 ( 0 ) + y 3 ⋅ μ 3 ( 0 ) + ⋯ + y 10 ⋅ μ 10 ( 0 ) ( μ 1 ( 0 ) + μ 2 ( 0 ) + μ 3 ( 0 ) ⋯ + μ 10 ( 0 ) ) = 6 × 0.5 10 × 0.5 = 0.6 q ( 1 ) = ( 1 − μ 1 ( 0 ) ) ⋅ y 1 + ( 1 − μ 2 ( 0 ) ) ⋅ y 2 ⋯ + ( 1 − μ 10 ( 0 ) ) ⋅ and 10 ( 1 − μ 1 ( 0 ) ) + ( 1 − μ 2 ( 0 ) ) + ⋯ + ( 1 − μ 10 ( 0 ) ) = 6 × 0.5 10 − 0.5 × 10 = 0.6 \begin{aligned} \pi^{(1)} & =\frac{1}{10}\left(\mu_1^{(0)}+\mu_2^{(0)}+\mu_3^ {(0)} \cdots+\mu_{10}^{(0)}\right) \\ & =0.5 \\ p^{(1)} & =\frac{y_1 \cdot \mu_1^{(0) }+y_2 \cdot \mu_2^{(0)}+y_3 \cdot \mu_3^{(0)}+ \cdots+y_{10} \cdot \mu_{10}^{(0)}}{\left (\in_1^{(0)}+\in_2^{(0)}+\in_3^{(0)} \cdots+\in_{10}^{(0)}\right)} \\ & =\frac {6 \times 0.5} {10 \times 0.5} \\ & =0.6 \\ q^{(1)} & =\frac{\left(1-\mu_1^{(0)}\right) \cdot y_1+ \left(1-\in_2^{(0)}\right) \cdot y_2 \cdots+\left(1-\in_{10}^{(0)}\right) \cdot y_{10}}{\left (1-\in_1^{(0)}\right)+\left(1-\in_2^{(0)}\right)+\cdots+\left(1-\in_{10}^{(0)} \right)} \\ & =\frac{6\times0.5}{10-0.5\times10}=0.6\end{aligned}Pi(1)p(1)q(1)=101( m1(0)+m2(0)+m3(0)+m10(0))=0.5=( m1(0)+m2(0)+m3(0)+m10(0))y1m1(0)+y2m2(0)+y3m3(0)++y10m10(0)=10×0.56×0.5=0.6=(1m1(0))+(1m2(0))++(1m10(0))(1m1(0))y1+(1m2(0))y2+(1m10(0))y10=100.5×106×0.5=0.6
By continuing to iterate, we can get the final parameters θ = (π, p, q) \theta=(\pi, p, q)i=( Pi ,p,A final value of q ) isthe three-coin model.

Use Python code to conduct experimental simulations and check whether it is effective

The following is a piece of code that is used to generate experimental data and use the EM algorithm to estimate parameters. Use the estimated parameters to conduct multiple simulation experiments to test the model's prediction ability.

import random
import numpy as np
import time
def coin_experiment(pi, p, q, n):
    """
    模拟掷硬币实验
    pi: 硬币 A 正面出现的概率
    p: 硬币 B 正面出现的概率
    q: 硬币 C 正面出现的概率
    n: 实验次数
    """
    results = []
    results_A = []
    results_B = []
    results_C = []
    for i in range(n):
        # 先掷硬币 A
        if random.random() < pi:
            # 选硬币 B
            coin = 'B'
            p_head = p
        else:
            # 选硬币 C
            coin = 'C'
            p_head = q
        
        # 接着掷选出的硬币
        if random.random() < p_head:
            results.append(1)
        else:
            results.append(0)
        
        # 记录每个硬币的正反面
        if coin == 'B':
            if random.random() < p:
                results_B.append(1)
            else:
                results_B.append(0)
            results_A.append(1)
        else:
            if random.random() < q:
                results_C.append(1)
            else:
                results_C.append(0)
            results_A.append(0)
    
    # 计算 A、B、C 硬币的正面概率
    p_A = sum(results_A) / len((results_A))
    p_B = sum(results_B) / len(results_B)
    p_C = sum(results_C) / len(results_C)
    
    return results, p_A, p_B, p_C


pi = 0.2
p = 0.3
q = 0.8
n = 100000
s=time.time()
print(f'开始模拟,模拟参数为pi={
      
      pi},p={
      
      p},q={
      
      q}……')
Y,pi,p,q=coin_experiment(pi, p, q, n)
Y= np.array(Y)
print(f'模拟结束,共模拟{
      
      n}次,耗时{
      
      time.time()-s}……')
print(f"模拟结果,pi={
      
      pi},p={
      
      p},q={
      
      q},整体实验Y=1概率={
      
      sum(Y)/len(Y)}")

#EM参数反推,任意设定初始参数
pi_0 = 0.9
p_0 = 0.8
q_0 = 0.9
epsiodes=1000 #迭代次数
count=1
while count<=epsiodes:
    mu = pi_0 * p_0**Y * (1-p_0)**(1-Y) / (pi_0 * p_0**Y * (1-p_0)**(1-Y) + (1-pi_0) * q_0**Y * (1-q_0)**(1-Y))
    pi=(1/n)*sum(mu)
    p=sum(Y*mu)/sum(mu)
    q=sum((1-mu)*Y)/sum(1-mu)
    if count%100==0:
        print(f"第{
      
      count}次迭代,估算参数分别为:pi={
      
      pi},p={
      
      p},q={
      
      q}")
    pi_0 = pi
    p_0 = p
    q_0 = q
    count+=1

#用拿到的估计参数重新模拟下
s=time.time()
print(f'二次模拟检验,模拟参数为pi={
      
      pi},p={
      
      p},q={
      
      q}……')
Y,pi,p,q=coin_experiment(pi, p, q, n)
Y= np.array(Y)
print(f'模拟结束,共模拟{
      
      n}次,耗时{
      
      time.time()-s}……')
print(f"模拟结果,pi={
      
      pi},p={
      
      p},q={
      
      q},整体实验Y=1概率={
      
      sum(Y)/len(Y)}")

Observe the experiment and conduct 100,000 experiments according to the initial pi = 0.2, p = 0.3, q = 0.8 we set.
EM algorithm simulates the three-coin model process and solution parameters

During the simulation process, pi, p, and q were very close to the values ​​we set. The experiment was indeed conducted according to these parameters. The probability of the overall experimental result y=1 was 0.696. Finally, the parameter pi = 0.91 was obtained after 1000 iterations of the EM algorithm. p = 0.68, q = 0.83. Although they are very different from the actual parameters of our model, we conducted a new 100,000 experiments with these parameters. The final overall experimental result is that the probability of y=1 is 0.6983, which is very similar to the results of our actual model. It is close, a more accurate simulation is achieved, and the model is effective .

References

[1]. Detailed explanation of the derivation process of Q function of EM algorithm

Guess you like

Origin blog.csdn.net/qq_33909788/article/details/134772228