Principle of Bayesian Classifier - Study Notes

Introduction

Bayesian classifier, as one of the classic pattern recognition algorithms, has an extremely important position and use, and approximates the Bayesian theorem.

1. Inverse probability reasoning and Bayesian formula

1. Deterministic reasoning and probabilistic reasoning

insert image description here

2. Bayesian formula

Bayesian formula is to solve the inverse probability reasoning problem. Starting from the known results, judge the probability that a certain type of situation is the cause of the result. Bayesian formula:
insert image description here
Indicates the sum of the probabilities of the result A under the condition Bi.

2. The principle of Bayesian classification

Solve the problem of uncertain statistical classification. Knowing the probability that samples of each category obtain different feature vectors, the probability of a sample belonging to each category is calculated based on the feature vector of a sample to be identified. At this time, the corresponding relationship is as follows:

Bayesian classification Bayes formula
The overall probability of occurrence of each type of sample Prior probability P(wi)
The probability that a sample in each class obtains a specific feature vector class conditional probability P(x,wi)
The probability that the sample to be calculated belongs to each class when it obtains a specific feature vector Posterior probability P(wi,x)

insert image description here
Classification decision rule: Classify samples according to the calculated posterior probability

As mentioned above, Bayesian classification starts from the result to find the cause, so the prior probability and class conditional probability must be known during the training process .
When the prior probability is unknown, the probability can be made equal, or the frequency of occurrence of a certain attribute in the sample set can be used as the prior probability, and then the prior probability can be corrected with the newly obtained information; when the conditional probability of the class is unknown, it is often necessary to estimate it from data statistics
.
Because Bayesian classification is probabilistic, there is an error rate in the classification decision .

3. Probability estimation

1. Estimation of prior probability

Treat the prior probability as a constant :
(1) If the sample is randomly sampled , you can use the frequency of a certain class of attributes in the sample set as the prior probability: P(wi)=ni/N (2) Treat all categories as a uniform distribution: P(wi)=1/c Treat the prior probability as a probability distribution: P(wi)=∫ P(wi|x) Set the initial value of the prior probability arbitrarily, and calculate the posterior probability of all samples belonging to a certain class in the training set when the conditional probability of the class is known, and then use its
mathematical expectation for
Update posterior probabilities
.

2. Estimation of Class Conditional Probability

(1) Parameter estimation: Assume that there is a specific distribution form , such as normal distribution, binomial distribution... and then use the training set that already has type labels to estimate the parameters of the probability distribution
(2) Non-parametric estimation: on the basis of not knowing or Bugatti distribution form, directly use the information in the sample set to estimate the probability distribution of the sample. The probability obtained in this case is usually a numerical model .
In the estimation of class conditional probability, parameter estimation is usually used, and the training process of probability model is the parameter estimation process. The frequentist school believes that although the parameters are unknown, they have fixed values ​​that exist objectively, so the parameter values ​​can be determined by optimizing the likelihood function. The Bayesian school believes that parameters are unobserved random variables, which can also have divisions, so it is assumed that the parameters obey a prior distribution, and then the posterior distribution of the parameters is calculated based on the observed data. The most commonly used methods are the maximum likelihood estimation method
of the frequentist school and the Bayesian estimation method of the Bayesian school : (1) Maximum likelihood estimation: usually logarithmic likelihood estimation. (omitted) (2) Bayesian estimation: ①The parameter to be estimated is the prior probability distribution of Θi is P(Θi); ②The joint probability density distribution P(xi|Θi) of this type of sample set xi is a function of Θi; ③The posterior probability P(Θi|xi) of Θi is obtained; Different discriminant functions lead to different classification decision boundaries.






4. The error rate of Bayesian classification

The error rate of the classifier: the mathematical expectation of the classification error probability
Example: the error rate of the minimum error classifier: the probability that the sample is classified into the class with a large posterior probability, but the sample itself does not belong to that class. (See below for the minimum error Bayesian classifier, common Bayesian classifier)
​​The error rate of Bayesian classification (two categories) is equal to the error rate of samples belonging to the first category w1 being misclassified to w2 plus the probability of misclassifying samples belonging to the second category w2 to w1.
insert image description here

Five, commonly used Bayesian classifier

1. Minimum error rate Bayesian classifier

Classification decision rule: Divide the sample into a class with a large posterior probability.

If P(wi|x)=maxP(wj|x), then x∈wi
has the maximum posterior probability: P(error|x)=ΣP(wj|x)-maxP(wj|x), so the maximum posterior probability is equivalent to the minimum error rate => If P(x|wi)P(wi)=max[P(x|wj)P(wj)], then x∈wi is the point where the classification decision boundary is the
minimum
error rate .

Note that the minimum error rate Bayesian classifier is a linear classifier, but the classification decision boundary is not necessarily linear, and the cutoff point is the point with the same posterior probability.
insert image description here

2. Minimum Risk Bayesian Classifier

Decision-making: Classify the sample x to be identified into wi
Loss λij: The loss of misclassifying the sample x that really belongs to wj to class wi Conditional
risk R(αi|x) = E[λij] = ∑λijP(wj|x)
Classification decision rule: If R(αk|x) = min R(αi|x), then x∈wk

3. Naive Bayes classifier

Naive Bayesian classifiers address situations where the class conditional probabilities are unknown. The estimation of class conditional probability can estimate the probability distribution according to the eigenvalues ​​of a certain class of samples in each dimension, and the probability distribution is a joint probability distribution
in each dimension . The Naive Bayesian classifier assumes that each dimension affects the classification results completely independently . The one-dimensional probability density estimation at this time: P(x|wi)=∏P(xk|wi)

However, in actual engineering practice, the sample features often cannot meet the independence conditions. Generally, the method of feature grouping can be used to properly consider the interdependence information between some attributes. Each group contains a small number of related features to ensure that each group is independent of each other. There is no need to perform a complete joint probability calculation, and the relatively strong attribute dependence will not be ignored. Based on this idea, another classifier, the semi-naive Bayesian classifier, is produced .

epilogue

Bayesian classifiers are widely used in the field of pattern recognition, especially in the field of information retrieval.
The Naive Bayesian classifier assumes that all attributes are completely independent. Although in practical applications, the assumption is difficult to hold, but in the application, the Naive Bayesian classifier usually has very good performance.

reference

When studying, refer to the open course "Pattern Recognition of Artificial Intelligence" of Beijing Institute of Technology.
Book reference: "Machine Learning" Zhou Zhihua

Guess you like

Origin blog.csdn.net/qq_43842886/article/details/122688862
Recommended