introduction

The EM algorithm is a selection algorithm, which was proposed by Dempster et al. in 1977. It is used for the maximum likelihood estimation of the probability model parameters containing hidden variables (hidden variable), or the maximum posterior probability estimation for each selection of the EM algorithm. The generation consists of two steps: E step, expectation (expectation); M step, maximization (maximization), so this algorithm is called expectation maximization algorithm (expectation maximization algorithm), referred to as EM algorithm.

EM Algorithm Examples and Solutions

Three-coin model: Suppose there are 3 coins, respectively $A, B, and C$ are the probabilities of heads of these coins being $\pi$ _ $p$ and $q$ Carry out the following coin trial: first coin $A$ , select coin $B$ or coin $C$ , pick coin $B$ , choose coin $C$ ; Then toss the selected coin, the result of tossing the coin, if there is a head, it will be recorded as 1, and if there is a negative, it will be recorded as 0; $n $ experiments are repeated independently (here, n=10), and the observation results are as follows: 1 , 1
$1, 1, 0, 1, 0, 0, 1, 0, 1, 1$
Assume that only the result of the coin toss can be observed, but the process of the coin toss cannot be observed. Ask how to estimate the probability of three-coin heads, the parameters of the three-coin model.
The model expression is:
$\begin{align} P(y|\theta) &= \sum_{z}P(y,z|\theta)=\sum_{z}P(z |\theta)P(y|z,\theta) \nonumber\\ &=\pi p^y (1-p)^y+(1-\pi)q^y(1-q)^{1-y } \nonumber \end{align}$
Here, the random variable $y$ is an observed variable, indicating that the result of a trial observation is 1 or 0; a random variable is a hidden variable, indicating that the unobserved coin toss $A$ snowflake; $\theta = ( \ pi , p , q )$ is the model parameter. This model is the generative model of the above data. Note that the random variable $The data of y$ can be observed, the random variable $The data for z$ are unobservable.
Express the observed data as $Y=(Y_1,Y_1,...,Y_1)^T$ , unobserved data is expressed as $Z=(Z_1,Z_2,...,Z_n)^T$ , then the likelihood function of the observed data is
$P(Y|\theta)=\sum_zP(Z|\theta)P (Y|Z,\theta)$
即
$P(Y|\theta)=\prod_ {j=1}^{n}[\pi p^{y_j} (1-p)^{y_j}+(1-\pi)q^{y_j}(1-q)^{1-{y_j} }]$
Consider finding model parameters $\theta=(\pi,p,q)$ Specifies the same value, whereθ
$\hat{\theta}=arg \max\limits_{\theta}logP(Y|\theta) .$
has no analytical solution, and can only be solved by an iterative method. The EM algorithm is an iterative algorithm that can be used to solve this problem. The EM algorithm for the above problems is given below, and its derivation process is omitted.
The EM algorithm first selects the initial value of the parameter, which is recorded as $\theta^{(0)}=(\pi^{(0)} ,p^{(0)},q^{(0)})$ , and then pass through the estimated values of the following iteration parameters until convergence. SectionThe estimated value of the i iteration parameter is $θ$ $\theta^{(0)}=(\pi^{(i)},p^ {(i)},q^{(i)})$ . of the EM algorithm $i + 1$ iteration is as follows.
Step E: Calculate the model parameters $\pi^{(i)},p^{(i)},q^{(i)}$ Observational data $y_i$ Determine the function B of the function
$\mu^{i+1}=\frac{\pi (p^{(i)})^{ y_j} (1- (p^{(i)}))^{y_j}}{\pi (p^{(i)})^{y_j} (1- (p^{(i)}))^ {y_j}+(1-\pi)(q^{(i)})^{y_j}(1-(q^{(i)}))^{1-{y_j}}}$
Step M: Compute new estimates of model parameters
$\pi^{(i+1)}=\frac{1}{n} \sum_{j=1}^{n} \mu_j^{(i+1)}$
$p^{(i+1)}=\frac{ \sum_{j=1}^{n} \mu_j^{(i+1)}y_j}{ \sum_{j=1}^{n} \mu_j^{(i+1)}}$
$q^{(i+1)}=\frac{ \sum_{j=1}^{n} (1-\mu_j^{(i+1)})y_j}{ \sum_{j=1}^{n} (1-\mu_j^{(i+1)})}$
Perform numerical calculations, assuming that the initial value of the model parameters is
$\pi^{(0)}=0.5,p^{(0)}= 0.5,q^{(0)}=0.5$
对 $y_j=1$ 与 $y_j=0$ 均有 $\mu_j^{(1)}=0.5$ .
According to the M-step calculation,
$\pi^{(0)}=0.5,p^{(0)}=0.6,q^{( 0)}=0.6$
According to the E step,
$\mu_j^{(2)}=0.5,j=1,2,...,10$
to iterate and get
$\pi^{(0)}=0.5,p^{(0)}=0.6,q^{( 0)}=0.6$
Then get the model parameter $\theta$ Maximum likelihood estimation of $θ$
$\hat{\pi}=0.5,\hat p=0.6,\hat q=0.6$
$\pi=0.5$ means that coin A is well-balanced, and this result is easy to understand.
If the initial value $\pi^{(0)}=0.4,p^{(0)}=0.6,q^{(0 )}=0.7$ , then the maximum likelihood estimation of the model parameters obtained is $\hat{\pi}=0.4064,\hat p=0.5368,\hat q=0.6432$ . That is to say, the EM algorithm is related to the selection of initial values, and choosing different initial values may result in different parameter estimates.

EM Algorithm Steps and Description

Generally, Y is used to represent the data of observed random variables, and Z is used to represent the data of hidden random variables. The combination of Y and Z is called complete-data, and the observed data Y is also called incomplete-data. Suppose given observation data Y, its probability distribution is $P(Y|\theta)$ , where is the model parameter to be estimated, then the likelihood function of incomplete data Y is $P(Y|\theta)$ , log likelihood function $L(0)=logP(Y|\theta)$ ; Suppose the joint probability distribution of Y and Z is $P(YZ|\theta)$ , the complete log likelihood function is $logP(Y,Z|\theta)$
Let $L(\theta)=logP(Y|\theta)$ . $L (i) =$ The maximum likelihood estimation of $l$ $o$ $g$ $P$ $($ $Y$ $∣$ $θ$ $) contains two steps in each generation: E step, seeking expectation: M step, seeking maximization, and the EM algorithm is introduced below.$
Input: observed variable data $Y$ , hidden variable data $Z$ , joint distribution $P(Y,Z|\theta)$ , conditional distribution $P(Z|Y,\theta)$ ;
output: model parameters $\theta$ .
(1) Select the initial value of the parameter $\theta^{(0)}$ , start iteration;
(2) E-step: remember $\theta^{(i)}$ for $The estimated value of the parameter of the i$ generation, at the $i + For the 1st$ equation E, the function
$\begin{align} Q(\theta,\theta^{(i)}) &=E_Z[logP(Y,Z|\theta)|Y,\theta^{ (i)}] \nonumber\\ &=\sum_{Z}logP(Y,Z|\theta)P(Z|Y,\theta^{(i)}) \nonumber\\ end{align}$
For example, $P(Z|Y,\theta)$ in the given observation data $Y$ and the current parameter estimate $\theta^{(i)}$ Lower hidden variable dataThe conditional probability distribution of $Z$
; (3)M step: find $Q(\theta,\theta^{(i)})$ Maximized $\theta$ , determine $i + The estimated value of the parameter of 1$ iteration $\theta^{(i+1)}$
$\theta^{(i+1)}=\arg \max \limits_{\theta}Q(\theta,\theta^{(i)})$
(4) Repeat steps (2) and (3) until convergence.
Function $Q(\theta,\theta^{(i)})$ is the core of the EM algorithm, called the Q function (Q function).
The following are some explanations about the EM algorithm:
Step (1) The initial value of the parameter can be selected arbitrarily, but it should be noted that the EM algorithm is sensitive to the initial value Step (2) Step E to find $\theta,\theta^{(i)})$ . In the Q function formula, Z is unobserved data, Y is observed data Note, $Q(\theta,\theta^{(i)})$ The first variable represents the parameter to be maximized, and the second variable represents the current estimated value of the parameter. Each iteration is actually asking for $The Q$ function is extremely large.
Step (3) M steps to find $Q(\theta,\theta^{(i)})$ to maximize, get $\theta^{(i+1)}$ , completed first generation $\theta^{(i)}->\theta^{(i+1)}$ . It will be proved later that each iteration increases the likelihood function or reaches a local extremum.
Step (4) gives the condition to stop iteration, generally for smaller positive numbers $\epsilon_1,\epsilon_2$ , if it satisfies∣
$\theta^{(i+1)}- \theta^{(i)}||<\epsilon_1 或||Q(\theta^{(i+1)},\theta^{(i)})- Q(\theta^{(i)},\theta^{(i)})||<\epsilon_2$
then stop iterating

Generate model-related algorithms: EM algorithm steps and formula derivation

EM algorithm

introduction

EM Algorithm Examples and Solutions

EM Algorithm Steps and Description

Guess you like