Likelihood function study notes

definition:

In mathematical statistics, the likelihood function is a function about the parameters in the statistical model, expressing the likelihood of the model parameters. Likelihood is used to estimate the parameters of the nature of things when the results of certain observations are known.
We can conversely construct a method to express the likelihood: given that an event A has occurred, use the likelihood function L ( B ∣ A ) L(B|A)L ( B A ) , we estimate the likelihood of parameter B. Formally, the likelihood function is also a conditional probability, but the variable we care about has changed:
b ↦ P ( A ∣ B = b ) b \mapsto P(A \mid B=b)bP(AB=b )
Here, the likelihood function is not required to satisfy normalization:∑ b ∈ BP ( A ∣ B = b ) = 1 \sum_{b \in \mathcal{B}} P(A \mid B=b)=1bBP(AB=b)=1 . A likelihood function multiplied by a positive constant is still a likelihood function. For allα > 0 \alpha>0a>0 , can have a likelihood function:
L ( b ∣ A ) = α P ( A ∣ B = b ) L(b|A)=\alpha P(A|B=b)L(bA)=α P ( A B=b)


example:

Consider the experiment of tossing a coin. Generally speaking, it is known to throw a "fair coin" (the probability of both heads up and tails up is 0.5), that is, the probability of heads (Head) up is p H = 0.5 p_{ H}=0.5pH=0.5 , you can know the possibility of various results after throwing several times.

For example, the probability of getting heads twice is 0.25. In terms of conditional probability, it is:
P ( HH ∣ p H = 0.5 ) = 0.5 2 = 0.25 P(HH|p_H=0.5)=0.5^2=0.25P(HHpH=0.5)=0.52=0.25
where H means face up.
If the mass distribution of a coin is not even enough, then it may be an "unfair coin"

In statistics, we are concerned with information about the probability of a coin tossing heads given the outcome of a series of tosses. We can build a statistical model: Suppose there will be p H p_{H}
when the coin is thrownpHThe probability of heads is up, and there is 1 − p H 1-p_H1pHThe probability of tails up.
At this time, by observing the two throws that have occurred, the conditional probability can be rewritten as a likelihood function:
L ( p H ∣ HH ) = P ( HH ∣ p H = 0.5 ) = 0.25 L(p_H|HH)=P(HH |p_H=0.5)=0.25L(pHHH)=P(HHpH=0.5)=0.25
That is to say, for the given likelihood function,the likelihood of p H = 0.5 is 0.25 when both throws are observed to be heads up The likelihood of p_H=0.5 is 0.25pH=The likelihood of 0.5 is 0.25 . But the opposite is not necessarily true, that is, when the likelihood function is 0.25, it cannot be inferred thatp H = 0.5 p_H=0.5pH=0.5 .
If consideringp H = 0.6 p_H=0.6pH=0.6 , then the value of the likelihood function will also change.
L ( p H ∣ HH ) = P ( HH ∣ p H = 0.6 ) = 0.36 L(p_H|HH)=P(HH|p_H=0.6)=0.36L(pHHH)=P(HHpH=0.6)=0.36 ,
notice that the value of the likelihood function becomes larger.
This shows that if the parameterp H p_HpHIf the value of becomes 0.6, the probability of observing two consecutive heads up is higher than the assumption that p H = 0.5 p_H=0.5pH=It is larger at 0.5 . That is, the parameterp H p_HpHTaking it as 0.6 is more convincing and "reasonable" than taking it as 0.5.
In short, the importance of the likelihood function is not its specific value, but whether the function becomes smaller or larger when the parameters change.

For the same likelihood function, in the model it represents, there are many possibilities for a certain parameter value, but if there is a parameter value that makes its function value reach the maximum, then this value is the most "reasonable" parameter parameter value.


maximum likelihood estimation

Maximum likelihood estimation is the first and most natural application of the likelihood function. As mentioned above, the maximum value of the likelihood function indicates that the corresponding parameters can make the statistical model the most reasonable. Starting from such an idea, the method of maximum likelihood estimation is: first select the likelihood function (usually the probability density function or probability mass function), and then find the maximum point after sorting. In practical applications, the logarithm of the likelihood function is generally used as the function for finding the maximum value. The maximum point of the likelihood function is not necessarily unique, nor does it necessarily exist.

Guess you like

Origin blog.csdn.net/han_xj/article/details/111331005