Maximum likelihood estimation + EM algorithm

Preface

First of all, why put them together? Obviously because these two are related to each other... The EM algorithm uses maximum likelihood estimation. In short, let's get started.

text

maximum likelihood estimation

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate model parameters. The core idea is to estimate the unknown parameters of the model by finding parameter values ​​that maximize the likelihood function of the observed data. The likelihood function represents the probability of observing existing data given the parameter values. The goal of MLE is to find parameter values ​​that maximize this probability.

Suppose we have a set of independent and identically distributed (i.i.d) observation data (x_1, x_2, ..., x_n). These data come from a certain probability distribution, and its probability density function is f(x; theta)), where (theta) is an unknown parameter. The likelihood function L(theta) is defined as the probability of occurrence of observed data under a given parameter (theta), that is:

It is usually more convenient to take the log function of the likelihood function, which is the logarithm of the likelihood function:

The goal of maximum likelihood estimation is to find parameter values ​​that maximize the log-likelihood function, that is:

Usually, in order to solve this optimization problem, we use mathematical optimization methods, such as gradient ascent, Newton's method, etc.

example:

Consider a simple example of the binomial distribution. Suppose we have a coin and the result of the toss is heads (1) or tails (0). We want to estimate the probability p of the coin tossing heads. This problem can be modeled as a binomial distribution, whose probability mass function is:

Okay, next is the EM algorithm

EM algorithm

Expectation-Maximization (EM) is an iterative optimization algorithm used to deal with parameter estimation problems of probabilistic models containing latent variables. The goal of the EM algorithm is to find parameter values ​​that maximize the likelihood function of the observed data, especially when there are unobserved or missing data.

The EM algorithm contains two main steps: E step (Expectation Step) and M step (Maximization Step). These two steps are performed alternately until convergence.

1. Step E (Expectation Step):

In step E, by introducing latent variables, the posterior probability that the observed data comes from each component is calculated. This posterior probability represents the probability that each observation belongs to each component given the current parameter value. For the one-dimensional Laplacian mixture model, the probability of step E is calculated as follows, that is, first randomly generate the value we require, and then use this value to calculate:

2. M step (Maximization Step):

In the M step, the model parameters are updated by maximizing the complete data log-likelihood function, that is, the value we assumed is updated. For a one-dimensional Laplacian mixture model, the M-step is updated as follows:

Update blend weights

Update mean parameter \( \mu_j \):

example:

Guess you like

Origin blog.csdn.net/m0_73872315/article/details/134366856