The likelihood function to the EM algorithm (with code implementation)

1. What is the EM algorithm

Expectation-maximization algorithm (Expectation-maximization algorithm, also translated as expectation-maximization algorithm), is to find the probability model parameters maximum likelihood estimation or maximum a posteriori estimate of algorithms, probabilistic models rely on unobservable hidden variables.

Expectation-maximization algorithm consists of two steps alternately calculated,

The first step is to calculate the desired (E), the use of existing estimate of hidden variables to calculate the maximum likelihood estimate;
The second step is the maximization (M), in Step E maximizing the maximum likelihood obtained calculated parameter values. Parameters found in the M-step is the estimated value E for the next calculation step, the process continues alternately.

Maximum likelihood estimate is summarized in one sentence: to know the result, reverse thrust condition θ.

1.1 likelihood function

In mathematical statistics, the likelihood function is a statistical model on the parameters of a function representing the likelihood of the model parameters. "Likelihood" and "probable" or "probability" similar meaning, refer to the possibility of some kind of event. The maximum likelihood is equivalent to the maximum possible mean.

For example, you go out hunting with a classmate and a hunter, a hare ran across the front. Only heard a gunshot, hare crashed to the next, if you want to speculate, the bullets hitting Who is playing? You will think, only made one shot will be hit, because the probability is generally greater than the probability of hunters hit you hit the students who, in order to infer that the gun should be the hunter shot in.

It concluded that this case made it reflects the basic idea of ​​the maximum likelihood method.

In most cases we are to project results based on known conditions, and maximum likelihood estimation results are already known, then the possibility of seeking the maximum conditions that result appears, as an estimate.

1.3 Maximum Likelihood function of solution steps

We assume that 100,000 individuals drawn from among 100 individual statistics do height, then be able to get this probability is 100 people (even by probability):

\[L(\theta)=L(x_1,...,x_n|\theta)=\prod_{i=1}^{n}p(x_i|\theta),\theta\in\ominus\]

This requirement is now \ (\ Theta \) values, i.e. such that \ (L (\ theta) \ ) maximize the probability, then the time parameter \ (\ Theta \) is the desired.

For this analysis, we can define the log-likelihood function, which is connected into the form plus:

\[H(\theta)=lnL(\theta)=ln\prod_{i=1}^{n}p(x_i|\theta)=\sum_{i=1}^{n}lnp(x_i|\theta)\]

Seeking a function for extreme value, through calculus in college we learned, the most direct idea is derivative, and then let the derivative is 0, then the solution of this equation of θ it wants (of course, that is a function of L ( [theta]) continuously differentiable). However, if θ is a vector containing a number of parameters that how to handle it? Of course seek L (θ) the partial derivatives of all parameters, i.e. the gradient, so that the n unknown parameters, there are n equations, is the solution of equations like extreme points of the likelihood function, which finally obtained n value argument.

General Procedure Maximum Likelihood estimation function values:

  1. Write likelihood function;
  2. Likelihood function logarithm, and analyzed;
  3. The derivative, so that the derivative is 0, to obtain the likelihood function;
  4. Solutions of the likelihood function, the parameters obtained is also desired;

1.4 EM algorithm

Two coins A and B, assumed that the random throwing face up probability as PA, PB, respectively. In order to estimate the probability of these two coins upwards, we take turns flipping a coin A and B, each round have to throw five times in a row, a total of five:

coin result statistics
A Positive pros and cons pros and cons 3 n -2 Anti
B Pros and cons of anti-anyway 2 n -3 anti
A Positive and negative anti-anti-anti 1 positive anti--4
B Anyway, positive and negative positive 3 n -2 Anti
A Pros and cons of anti-anyway 2 n -3 anti

A coin toss was 15 times in the first round, the third round, the fifth round appeared three times n, n-1, n-2, it is easy to estimate the PA, similar, PB can be easily calculated out ( true value ), as follows:

PA = (3 + 1 + 2) / 15 = 0.4
PB = (2 + 3) / 10 = 0.5

The question is, if we do not know the toss of the coin is A or B it (ie the kind of coin is hidden variables), then take turns throwing five following results were obtained:

coin result statistics
Unknown Positive pros and cons pros and cons 3 n -2 Anti
Unknown Pros and cons of anti-anyway 2 n -3 anti
Unknown Positive and negative anti-anti-anti 1 positive anti--4
Unknown Anyway, positive and negative positive 3 n -2 Anti
Unknown Pros and cons of anti-anyway 2 n -3 anti

OK, the problem becomes interesting. Our goal now is not changed, or estimate PA and PB, need to how to do it?

Obviously, at this time we have more than a coin types of hidden variables, set to z, it can be thought of as a 5-dimensional vector (z1, z2, z3, z4, z5), on behalf of each coin thrown when used, For example z1, on behalf of the first round of the coin throwing is used when a or B.

  • However, this variable z does not know, will not be able to estimate PA and PB, so we have to estimate z, then in order to further estimate PA and PB.
  • We had better estimate z, we have to know the PA and PB, so that we can use the laws of probability to maximum likelihood estimate z, this is not a chicken and egg problem for you, how to break?

The answer is to randomly initialize a PA and PB, use it to estimate z, and then based on z, or in accordance with the maximum likelihood laws of probability to estimate the new PA and PB, followed by cycle, if the new estimated the PA and PB and our true great value difference until the PA and PB converge to the true value so far.

We may as well, to just give PA and PB assign a value, such as:
the probability of coin A heads-up PA = 0.2
obverse B upward probability PB = 0.7

Then we look at the first round of which is most likely to throw a coin.
If the coins A, obtained 3 positive probability is 0.2 Anti 2 0.2 0.2 0.8 0.8 = 0.00512
if the coins B, obtained 3 positive probability is 0.7 Anti 2 0.7 0.7 0.3 0.3 = 0.03087
and the other four in order to obtain the corresponding probabilities. Tabulated as follows:

Rounds If the coin A If the coin B
1 0.00512, i.e. n 0.2 0.2 0.2 0.8 0.8,3 trans -2 0.03087,3 positive anti--2
2 0.02048, i.e. n 0.2 0.2 0.8 0.8 0.8,2 trans -3 0.01323,2 positive anti--3
3 0.08192, i.e., 0.8, 0.2 0.8 0.8 0.8 n-trans -4 0.00567,1 positive anti--4
4 0.00512, i.e. n 0.2 0.2 0.2 0.8 0.8,3 trans -2 0.03087,3 positive anti--2
5 0.02048, i.e. n 0.2 0.2 0.8 0.8 0.8,2 trans -3 0.01323,2 positive anti--3

According to the maximum likelihood rule:
Round 1 the most likely coin B
round 2 the most likely coin A
round 3 the most likely coin A
round 4 the most likely coin B
the first five most likely coin a

We put a greater probability that A is more likely, that the first two, first three, the first five being the sum of the number of occurrences of 2,1,2, divided by the total number of A was thrown 15 (A throwing three rounds, each round 5), as the estimated value of z, a similar calculation method B. Then we will be able to follow the laws of probability to maximum likelihood estimate of the new PA and PB.

PA = (2 + 1 + 2) / 15 = 0.33
PB = (3 + 3) / 10 = 0.6

So, continue to keep close to the true value of the iteration, this is the magic of the EM algorithm.

Can look forward, we continue to follow the above ideas, with estimated PA and PB again estimate z, then z to estimate the new PA and PB, iterative down, you can finally get PA = 0.4, PB = 0.5, this time Whatever iteration, the value of PA and PB of 0.4 and 0.5 will remain unchanged, and thus, we found the biggest PA and PB likelihood estimation.

Summarize calculation steps:

  1. Distributed random initialization parameters θ

  2. Step E, Q seeking function, for each i, is calculated based on the last iteration of the model parameters to calculate the posterior probability implicit variables (variables actually desired recessive), as the current estimate of hidden variables:

    \[Q_i(z^{(i)})=p(z^{(i)}|x^{(i)};\theta)\]

  3. Step M, Q function find that the maximum value is obtained when the parameters) to maximize the likelihood function like to obtain a new parameter value

    \[\theta=argmax\sum_{i}\sum_{z^{(i)}}Q_i(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})}\]

  4. Then cyclically repeating steps 2 and 3 until convergence.

Please refer to the detailed derivation of the reference document at the end of the text.

2. The use of EM algorithm model of what?

EM algorithm using the model solution generally GMM or collaborative filtering, k-means actually belong to EM. EM algorithm will converge, but may converge to a local optimum. Since the sum of the number of items as the number of hidden variables index rose, gradient calculation would bring trouble.

3. code implementation

Gaussian mixture model EM algorithm

4. References

How popular understanding of the EM algorithm

Author: @mantchs

GitHub:https://github.com/NLP-LOVE/ML-NLP

Welcome to join the discussion! Improve joint project! Group number: [541,954,936]NLP interview learning group

Guess you like

Origin www.cnblogs.com/mantch/p/11220795.html