Different EM algorithm and the derivation of their own understanding

Different EM algorithm and the derivation of their own understanding


I. Introduction

EM algorithm generates a probability model for solving the main parameters of the hybrid model with hidden variables estimation problem.
For simple model, maximum likelihood estimation method according to the analytical solution can be obtained directly; may have complex models of hidden variables, the MLE is difficult to directly use the analytical solution, then the EM algorithm comes into play.
E-step problem solving hidden variables, M step to solve the model parameter values, which is the method of obtaining the maximum likelihood model parameter values.

Own understanding: step by step, take a look, I saw the walk, iterative process .
First, using the estimated parameter values directly set a set of model parameter values of the set of a priori model, can even say that we set the blind, such a set is certainly not accurate enough Yeah, we expect more and more precise parameter values , then use this set of parameter values now to solve the problem of hidden variables, latent variables obtained, then the value of the hidden variable parameters obtained inspection in accordance. Posterior is corrected on the basis of a priori on the observed data, so that the parameter value is a better match this set of data.


Second, an overview

Suppose a set of data, \ (X-= \ {X ^ I, X ^ 2 \ cdots, X ^ n-\} \) , comprising \ (n-\) independent samples, the set of samples by a set of resulting hybrid model and , we want to \ (\ Theta \) satisfies:
\ [\ Arg \ max _ {\ Theta} a logP (X-| \ Theta) \]

But X is a group of the hybrid model comes with hidden variables \ (Z \) related, \ (Z \) represents a model which sample belongs to.
Maximum likelihood estimation using the direct method of obtaining the above formula is difficult, by the EM algorithm, in turn solved by the analytical solution:
\ [\ Theta ^ {(T +. 1)} = \ Arg \ max _ {\ Theta} \ int_Z \; P (Z | X, \ theta ^ {(t)}) \; logP (X, Z | \ theta) \]

Through continuous iterative solution, obtained \ (\ theta \) can make \ (logP (X | \ theta ) \) is increasing, to achieve our objective.

If the known data as generated by a probabilistic model, then \ (P (X | \ theta ) \) can be very complicated, and we do not know \ (P (X | \ theta ) \) in the form of two eye of a smear, and consequently do not know, so just use the inductive bias, assuming that it is subject to a model for generation model, assuming that there is a hidden variable \ (the Z-\) , \ (the Z-\) is responsible for the \ (X \) is generated, with the assumption that \ (P (X) \) have a structure with a specific processing like structure. At this time,
\ (P (X-) = \ {the int_ the Z} P (X-, the Z) dZ \) , the \ (P (X) \) decomposition treatment.
The introduction of hidden variables \ (Z \) strike \ (\ Theta \) .


Third, the convergence

Obtained by the above \ (\ theta \) really can achieve our purpose? Even get \ (logP (X | \ theta ) \) great.
\ [\ Begin {aligned} logP (X | \ theta) & = log \ frac {P (X, Z | \ theta)} {P (Z | X, \ theta)} \\ & = log {P (X , Z | \ theta)} - log {P (Z | X, \ theta)} \ end {aligned} \]

Both sides simultaneously \ (P (Z | X, \ theta ^ {(t)}) \) is obtained integral:
\ [\} the aligned left the begin {& = \ int_ {Z} P (Z | X, \ theta ^ {(t)}) log {P (X | \ theta)} dZ \\ & = log {P (X | \ theta)} \ int_ {Z} P (Z | X, \ theta ^ {(t) }) dZ \\ & = log { P (X | \ theta)} \ end {aligned} \]

\[ \begin{aligned} 右边&=\int_{Z}P(Z|X,\theta^{(t)}) \left[log {P(X,Z|\theta)} - log{P(Z|X,\theta)}\right] dZ\\ &=\int_{Z}P(Z|X,\theta^{(t)}) log {P(X,Z|\theta)} dZ - \int_{Z}P(Z|X,\theta^{(t)}) log{P(Z|X,\theta)} dZ\\ &Q(\theta,\theta^{(t)}) = \int_{Z}P(Z|X,\theta^{(t)}) log {P(X,Z|\theta)} dZ\\ &H(\theta,\theta^{(t)}) = \int_{Z}P(Z|X,\theta^{(t)}) log{P(Z|X,\theta)} dZ\\ &log {P(X|\theta^{(t+1)})} -log {P(X|\theta^{(t)})} = Q(\theta^{(t+1)},\theta^{(t)}) -Q(\theta^{(t)},\theta^{(t)}) + H(\theta^{(t)},\theta^{(t)}) -H(\theta^{(t+1)},\theta^{(t)}) \end{aligned} \]

The \ (\ theta ^ {(t + 1)} \) of the formula is obtained, directly obtained:
\ [Q (\ Theta ^ {(T +. 1)}, \ ^ {Theta (T)}) \ GEQ Q (\ theta, \ theta ^ {(t)}) \]

At this time, so that \ (\ Theta = \ Theta ^ {(T)} \) , then:
\ [Q (\ Theta ^ {(T +. 1)}, \ Theta ^ {(T)}) \ GEQ Q (\ Theta ^ {(t)}, \ theta ^ {(t)}) \]

\[ \begin{aligned} H(\theta^{(t)},\theta^{(t)}) -H(\theta^{(t+1)},\theta^{(t)}&=\int_{Z}P(Z|X,\theta^{(t)}) log{P(Z|X,\theta^{(t)})}- \int_{Z}P(Z|X,\theta^{(t)}) log{P(Z|X,\theta^{(t+1)})}dZ\\ &=\int_{Z}P(Z|X,\theta^{(t)}) [log{P(Z|X,\theta^{(t)})}-log{P(Z|X,\theta^{(t+1)})}]dZ\\ &=\int_{Z}P(Z|X,\theta^{(t)})log \frac{P(Z|X,\theta^{(t)})}{P(Z|X,\theta^{(t+1)})}dZ\\ &=KL(P(Z|X,\theta^{(t)}) \;||\; P(Z|X,\theta^{(t+1)})) \geq 0 \end{aligned} \]

So \ [log {P (X | \ theta ^ {(t + 1)})} -log {P (X | \ theta ^ {(t)})} \ geq 0 \]

\[log {P(X|\theta^{(t+1)})} \geq log {P(X|\theta^{(t)})} \]


Fourth, the complete derivation

Jesen inequality: When the function f is concave when: \ [f [E] \ GEQ E [f] \]

Wherein \ (F \) represents a concave function, \ (E \) represents the desired. For example, \ (log [E (x) ] \ geq E [log (x)] \)

第一种推导方法
\[ \begin{aligned} log {P(X|\theta)} &= log \int_{Z}P(X,Z|\theta)dZ=log \int_{Z} \frac{P(X,Z|\theta)}{q(Z)}q(Z)dZ\\ &=log E_{q(Z)}[\frac{P(X,Z|\theta)}{q(Z)}] \geq E_{q(Z)}[log \frac{P(X,Z|\theta)}{q(Z)}] \end{aligned} \]

When taking the equal sign is \ (\ frac {P (X
, Z | \ theta)} {q (Z)} = C \) we \ (E_ {q (Z) } [log \ frac {P (X, Z | \ theta)} {q (Z)}] \) is called \ (ELBO \)
in other words, \ (ELBO \) is \ (log {P (X | \ theta)} \) lower bound, stop increases ELBO, can constantly increasing \ (log {P (X | \ theta)} \)

We can see behind $ \ int_Z; P (Z | X, \ theta ^ {(t)}); logP (X, Z | \ theta) $ is here \ (ELBO \)


第二种推导方法
\[ \begin{aligned} log {P(X|\theta)} &=log\frac{P(X,Z|\theta)}{P(Z|X,\theta)}\\ &=log P(X,Z|\theta) - log P(Z|X,\theta)\\ &=log \frac {P(X,Z|\theta)}{q(Z)} - \frac {log P(Z|X,\theta)}{q(Z)} \end{aligned} \]

At the same time both sides of the equation on the distribution \ (q (Z) \) requirements desired (both sides simultaneously \ (q (Z) \) integrating)
left or equal to the left (with reference to specific steps above)
\ [\} the aligned right side & the begin { = \ int_ {Z} q ( Z) log \ frac {P (X, Z | \ theta)} {q (Z)} dZ- \ int_ {Z} q (Z) log \ frac {log P (Z | X, \ theta)} {q (Z)} dZ \\ & = ELBO + KL (q (Z) \; || \; P (Z | X, \ theta)) \ end {aligned} \]

When \ (q (Z) and P (Z | X, \ theta ) \) with equality when the same distribution.

Thus:
E-step : Find a P = Q
M steps in many different forms :()
\ [\ the aligned the begin {} & \ Arg \ max _ {\ Theta} \} the int_ the Z {Q (the Z) log \ {P FRAC (X, Z | \ theta) } {q (Z)} dZ \\ & \ arg \ max _ {\ theta} \ int_ {Z} q (Z) log {P (X, Z | \ theta)} dZ \ \ & \ arg \ max _ { \ theta} \ int_ {Z} P (Z | X, \ theta ^ {(t)}) log {P (X, Z | \ theta)} dZ \\ & \ arg \ max_ {\ theta} \ sum_ {Z } P (Z | X, \ theta ^ {(t)}) log {P (X, Z | \ theta)} \ end {aligned} \]

Digression: In GMM, we directly \ (q (Z) \) is written as \ (P (Z | the X-, \ Theta) \) , Z is, after the experience. At this time,
\ [log P (X | \ theta) = ELBO \]

When we stop maximization \ (ELBO \) when, that is, maximization $ log P (X | \ theta ) $.
His understanding is from maximizing \ (log P (X | \ theta) \) This task became maximization \ (log P (X, Z | \ theta) \) this task.
Because direct maximization \ (log P (X | \ theta) \) This task we can not do this, then the introduction of hidden variables \ (the Z-\) (distribution corresponding sample belongs to which model) to help us solve the problem. From a stage into a second phase : the probability of seeking to maximize their joint distribution.


Fifth, generalized EM

\[logP(X|\theta) = ELBO+KL(q(Z)||P(Z|X,\theta))\]


\[L(q,\theta)=ELBO = E_{q(Z)}[ log\frac{P(X,Z|\theta)} {q(Z)}] \]

发散一下
\(logP(X|\theta) = E_{q(Z)}[ log{P(X,Z|\theta)}]-E_{q(Z)}[log \;q(Z)]+KL(q(Z)||P(Z|X,\theta))\)
\(= E_{q(Z)}[ log{P(X,Z|\theta)}]+H(q(Z))+KL(q(Z)||P(Z|X,\theta))\)
\(= E_{q(Z)}[ log{P(X,Z|\theta)}]+H(q(Z),P(Z|X,\theta))\)

E-step: Fixed \ (\ Theta \) , find q, at this time \ (logP (X | \ theta ) \) is a fixed value:
\ [\ the aligned the begin {Q} ^ {(T +. 1)} & = \ arg \ min_ {q} KL (q || P) = \ arg \ max_ {q} ELBO \\ & = \ arg \ max_ {q} L (q, \ theta ^ {(t)}) \\ & = \ arg \ max _ { \ theta} E_ {q (Z)} [log \ frac {P (X, Z | \ theta ^ {(t)})} {q (Z)}] \ end {aligned} \]

M-step:固定 \(q\),找出 \(\theta\)
\[ \begin{aligned} \theta^{(t+1)} &= \arg\max_{\theta}ELBO =\arg\max_{\theta} L(q^{(t+1)},\theta)\\&= \arg\max_{\theta} E_{q^{(t+1)}(Z)}[ log\frac{P(X,Z|\theta)} {q^{(t+1)}(Z)}]\\&= \arg\max_{\theta} E_{q^{(t+1)}(Z)}[ log{P(X,Z|\theta)} ] \end{aligned} \]


Guess you like

Origin www.cnblogs.com/SpingC/p/11632525.html