When learning the naive Bayesian classifier, I came into contact with Bayesian estimation, consulted a lot of materials, and found that the specific interpretation of the term Bayesian estimation is different, so I will sort it out as follows.
maximum likelihood estimation
Maximum likelihood estimation is a point estimation method of parameters proposed by frequentists.
Based on the condition that the parameter theta is fixed, even the parameter with the highest probability of occurrence of the current data set D is the actual parameter.
The specific solution method is to derive the likelihood function.
Bayesian estimation
Bayesian estimation is a parameter estimation method proposed by Bayesians. It can be divided into Bayesian point estimation and Bayesian interval estimation. This article does not involve interval estimation.
Based on the condition that the parameter theta is a random variable obeying a certain prior distribution. Then after the data set D appears, with new information, we can update the distribution of the parameter theta accordingly, and this updated distribution is the posterior probability distribution.
Bayesian point estimation method
Since the updated parameters are still random variables subject to a certain probability distribution, how should we choose if we only need one parameter vector? This involves three types of selection methods:
1. Posterior mode estimation . As the name suggests, picks the parameter that appears most frequently. That is, the posterior probability distribution appears theta with the greatest probability. Therefore, the derivation of the posterior probability distribution function is sufficient. This approach is similar to maximum likelihood estimation, and the mathematical expression is equivalent to multiplying the likelihood function by the prior distribution (when the prior distribution is a uniform distribution, that is, the prior distribution without information, the expression is the same), so it is called Regularized Maximum Likelihood Estimation, also known as Maximum Posterior Probability Estimation (MAP) , but keep in mind that the idea behind it is quite different.
2. Posterior median estimate . Picking the median of the parameter seems to be less used.
3. Posterior expectation estimation . is to select the mean of all parameters, that is,
Bayesian estimation
In the above, the maximum Bayesian point estimation, in a sense, selects the statistical value (mode, median and mean) of a random variable theta to replace the distribution, and the purpose of this is to reduce the amount of calculation. . But the real Bayesian estimation method should be to use all the parameters in the parameter space, build the model separately (obtain the ensemble of the model), and then use all the models for estimation, take the expectation of all the estimated values as the final estimated value, and the weight The value is calculated from the probability distribution of the parameter. Doing so can effectively avoid overfitting, but the amount of computation is very huge. The specific method to reduce the amount of calculation will be discussed later.
Note: The source of the pictures in this article , refer to the blog