Machine Learning - Maximum Likelihood Estimation and Bayesian Estimation

When learning the naive Bayesian classifier, I came into contact with Bayesian estimation, consulted a lot of materials, and found that the specific interpretation of the term Bayesian estimation is different, so I will sort it out as follows.

maximum likelihood estimation

Maximum likelihood estimation is a point estimation method of parameters proposed by frequentists.
Based on the condition that the parameter theta is fixed, even the parameter with the highest probability of occurrence of the current data set D is the actual parameter.
The specific solution method is to derive the likelihood function.
maximum likelihood estimation

Bayesian estimation

Bayesian estimation is a parameter estimation method proposed by Bayesians. It can be divided into Bayesian point estimation and Bayesian interval estimation. This article does not involve interval estimation.
Based on the condition that the parameter theta is a random variable obeying a certain prior distribution. Then after the data set D appears, with new information, we can update the distribution of the parameter theta accordingly, and this updated distribution is the posterior probability distribution.

Bayesian point estimation method

Since the updated parameters are still random variables subject to a certain probability distribution, how should we choose if we only need one parameter vector? This involves three types of selection methods:
1. Posterior mode estimation . As the name suggests, picks the parameter that appears most frequently. That is, the posterior probability distribution appears theta with the greatest probability. Therefore, the derivation of the posterior probability distribution function is sufficient. This approach is similar to maximum likelihood estimation, and the mathematical expression is equivalent to multiplying the likelihood function by the prior distribution (when the prior distribution is a uniform distribution, that is, the prior distribution without information, the expression is the same), so it is called Regularized Maximum Likelihood Estimation, also known as Maximum Posterior Probability Estimation (MAP) , but keep in mind that the idea behind it is quite different.
Maximum Posterior Probability Estimation
2. Posterior median estimate . Picking the median of the parameter seems to be less used.
3. Posterior expectation estimation . is to select the mean of all parameters, that is,

θ ^ = θ θ p ( θ | D ) d θ
Relative to MAP, an integral operation is required. However, the situation where the probability to be estimated is 0 can be effectively avoided. Due to the use of more, it is directly referred to as the Bayesian estimation of the parameters (it is easy to be confused with the Bayesian estimation below, and I personally feel that it is better to call it posterior expectation estimation) .

Bayesian estimation

In the above, the maximum Bayesian point estimation, in a sense, selects the statistical value (mode, median and mean) of a random variable theta to replace the distribution, and the purpose of this is to reduce the amount of calculation. . But the real Bayesian estimation method should be to use all the parameters in the parameter space, build the model separately (obtain the ensemble of the model), and then use all the models for estimation, take the expectation of all the estimated values ​​as the final estimated value, and the weight The value is calculated from the probability distribution of the parameter. Doing so can effectively avoid overfitting, but the amount of computation is very huge. The specific method to reduce the amount of calculation will be discussed later.
Bayesian estimation
Note: The source of the pictures in this article , refer to the blog

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325682493&siteId=291194637