Dry goods|One article to understand the maximum likelihood estimation

Main content: popular explanation of maximum likelihood estimation

Maximum likelihood estimation, in popular understanding, is to assume that the overall model distribution is known, and use the known sample result information to reverse the model parameter values ​​that are most likely (the maximum probability ) to cause these sample results!


In other words, Maximum Likelihood Estimation provides a method for evaluating model parameters given observation data, namely: "The model has been determined and the parameters are unknown".


Maybe some friends will talk about it, it's still a bit abstract. Let’s think of it this way. Once the model satisfies a certain distribution, I will find its parameter values ​​through the maximum likelihood estimation method. For example, the formula in the normal distribution is as follows:


image.png


If I obtain the values ​​of the parameters u and sigma in the model through maximum likelihood estimation, do we know the mean and variance of the model and all other information? It is indeed the case.


In maximum likelihood estimation, sampling needs to satisfy an important assumption that all samples are independent and identically distributed.

Let me use two examples to help understand the maximum likelihood estimation

1 example one


An example of someone else’s blog.


If there is a jar with black and white balls in it, the number is unknown, and the ratio of the two colors is also unknown. We want to know the ratio of the white balls to the black balls in the tank, but we cannot count all the balls in the tank.


Now we can take a ball out of the shaken tank at will every time, record the color of the ball, and then put the ball out back into the tank. This process can be repeated, and we can use the recorded color of the ball to estimate the proportion of black and white balls in the tank. If in the previous 100 repetitions, 70 times are white balls, what is the most likely proportion of white balls in the tank?


Many people have the answer right away: 70%. And what is the theoretical support afterwards?


We assume that the ratio of white balls in the tank is p, then the ratio of black balls is 1-p. Because every time a ball is drawn, after recording the color, we put the drawn ball back in the tank and shake it well, so the color of each drawn ball obeys the same independent distribution.


Here we call the color of the ball drawn once as a sampling. In the question, out of 100 samplings, 70 are white balls, and the probability of 30 black balls is P(sample result|Model).


If the result of the first abstraction is denoted as x1, and the result of the second sampling is denoted as x2.... then the sample result is (x1, x2..., x100). In this way, we can get the following expression:

P(Sample result|Model)

  = P(x1,x2,…,x100|Model)

  = P(x1|Mel)P(x2|M)…P(x100|M)

  = p^70(1-p)^30.


Okay, we already have an expression for the probability of the observed sample result. Then the parameters of the model we require are the p in the equation.


So how do we find this p? According to what standard should we find this p?


Different p directly leads to different P (sample result|Model).

Okay, our p actually has an infinite variety of distributions. as follows:


image.png


Then, under the distribution conditions of p above, find p^70(1-p)^30 as 7.8 * 10^(-31)

The distribution of p can also be as follows:


image.png


Then you can also find p^70(1-p)^30 as 2.95* 10^(-27)


So the question is, since there are countless kinds of distributions to choose from, what principle should the maximum likelihood estimation follow to select this distribution?


Answer: The method adopted is to maximize the possibility of the results of this sample, that is, to maximize the value of p^70(1-p)^30, then we can regard it as the equation of p, and just find the derivative!


So now that something has happened, why not let this result appear the most likely? Makes the occurrence of samples most likely to appear. This is the core of maximum likelihood estimation.


We want to find a way to maximize the probability of the observation sample. Converting to a mathematical problem is to make:


p^70(1-p)^30 is the largest. This is too simple. There is only one p for the unknown. If we set its derivative to 0, we can find p to be 70%, which is consistent with the 70% we thought at the beginning. Which contains our mathematical ideas in it.

2   example two


Suppose we want to count the average annual income of the people across the country. First, suppose this income obeys a normal distribution, but the mean and variance of the distribution are unknown. We do not have the human and material resources to count the income of everyone in the country. How about a billion people in our country? Isn't there no way then?


No, no, after having the maximum likelihood estimation, we can use it! For example, we select the population income of a city or a town as our observation sample result. Then the parameters of the normal distribution in the above hypothesis are obtained through maximum likelihood estimation.


With the results of the parameters, we can know the expectation and variance of the normal distribution . That is, we passed a small sample of sampling, and in turn learned a series of important mathematical indicators of the national people's annual income !


Then we know that the core key of maximum likelihood estimation is that for some cases , there are too many samples to obtain the parameter values ​​of the distribution. After sampling a small sample, the maximum likelihood estimation can be used to obtain the parameter values ​​of the distribution in the hypothesis. It is equivalent to getting a series of important mathematical indicators of the model.


I hope to help you understand~


reference:

From maximum likelihood to shallow solution of EM algorithm-zouxy09 column-blog channel-CSDN.NET

Maximum likelihood estimation learning-growoldwith_you's blog-blog channel-CSDN.NET

Recommended reading:

Selected dry goods|Summary of dry goods catalog in the past six months

Dry goods|Master the optimization of the mathematical foundation of machine learning [1] (key knowledge)

[Intuitive explanation] What is PCA and SVD




          Welcome to follow the public account to learn and exchange~         

image.png


Welcome to join the exchange group to exchange learning~

image.png


Guess you like

Origin blog.51cto.com/15009309/2553992