Table of contents
2. Bayesian Decision Criterion
3. Maximum Posterior Hypothesis
2. Naive Bayesian classification model
Algorithmic description of the Naive Bayes classifier:
Naive Bayes algorithm features
The modeling of Bayesian networks consists of two steps
Bayesian Belief Network Features
foreword
Bayesian classification method is a statistical classification method, which uses the knowledge of probability and statistics to predict the probability that a given tuple belongs to a specific class. Bayesian classification is based on Bayes' theorem. The simplest Bayesian classification algorithm is called Naive Bayesian classification. .
text
1. Subjective probability
Bayesian method is a reasoning method for studying uncertainty. Uncertainty is often expressed by Bayesian probability. It is a subjective probability, an individual subjective estimate, and changes with the change of individual subjective cognition.
Its estimation depends on the correctness of prior knowledge and the richness and accuracy of posterior knowledge, so Bayesian probability may often change with the information that individuals have. A judgment based on posterior knowledge depends on the mastery of information
2. Bayes Theorem
1. Basic knowledge
Under the condition that event A occurs, the probability of event B occurring is called the conditional probability of event B under the occurrence of event A, denoted as P ( B | A ), where P ( A ) is called prior probability, P ( B | A ) is called the posterior probability, and the formula for calculating the conditional probability is:
Then=Likelihood*Prior/Evidence Factor
The conditional probability formula is transformed into a multiplicative formula:
Let A and B be two random events, if P ( AB ) = P ( A ) P ( B ) is established, then the events A and B are said to be independent of each other. At this time, P ( A | B ) = P ( A ) is established.
Examples of Disease Diagnosis
Two diagnostic (examination) conclusions: the patient has cancer (+); or no cancer (-)
Prior knowledge: 0.8% prevalence in all populations
The accuracy rate of diagnosis is 98% for sick patients and 97% for non-disease patients
P(cancer) = 0.008 P(!cancer) = 0.992
P(+|cancer) = 0.98 P(-|cancer) = 0.02
P(+|!cancer) = 0.03 P(-|!cancer) = 0.97
Let B 1, B 2,…, B n be mutually exclusive events, P ( B i )>0, i =1, 2,…, n , and , for any event , the formula for calculating the probability of event A is :
Assuming P ( A ) >0, then under the condition that event A occurs, the probability of event B i occurring is:
, which is called the Bayesian formula
2. Bayesian Decision Criterion
If P ( C i | X ) > P ( C j | X ) holds for any i ≠ j , then the sample pattern X is judged to be class C i
3. Maximum Posterior Hypothesis
According to the Bayesian formula, a method for calculating the posterior probability can be obtained: under certain assumptions, the posterior probability can be obtained according to the prior probability and the probability obtained by statistical sample data.
Let P ( c ) be the prior probability of hypothesis c , which represents the probability that c is the correct hypothesis, P ( X ) represents the prior probability of the training sample X , and P ( X | c ) represents the condition that the hypothesis c is correct The probability of the occurrence or occurrence of the next sample X , according to the Bayesian formula, the calculation formula of the posterior probability can be obtained:
p Let C be the set of categories, that is, the set of hypotheses to be selected. When an unknown category label sample X is given , the most likely hypothesis c ∈ C is found by calculation . The hypothesis or category with the greatest possibility is called the maximum a posteriori Suppose (maximum a posteriori), written as:
4. Examples
Assuming the weather conditions of the day: X = { Sunny, Hot, High, Weak }, determine whether it is possible to play tennis today?
①Statistics
② Calculate the prior probability
③ Calculate the posterior probability (since we are judging whether to go out to play on the day, so when calculating below, only the upper part is calculated)
P(Yes | X) = P(Yes)*P(Sunny|Yes)*P(Hot|Yes)*P(High|Yes)P(Weak|Yes)
= 9/14 * 2/9 * 2/9 * 3/9 * 6/9 ≈ 0.0071
P(No | X) = P(No)*P(Sunny|No)*P(Hot|No)*P(High|No)P(Weak|No)
= 5/14 * 3/5 * 2/5 * 4/5 * 2/5 ≈ 0.0026
2. Naive Bayesian classification model
Naive Bayesian classification model: the algorithm logic is simple, the operation speed is fast, the classification time is short, and the accuracy is high
It is based on the assumption of class-conditional independence of attributes, that is, attributes are independent of each other under a given class-state condition
Algorithmic description of the Naive Bayes classifier:
- Initialization: Randomly select a category as the "prior probability", usually set to 0.5. This means that during training, we always divide samples into two categories: positive examples (or support examples, positive example) and negative examples (or rejection examples, negative example).
- Calculate the posterior probability: For each sample, we need to calculate the posterior probability that it belongs to each class. For a given sample, we can use the Bayesian formula to calculate its posterior probability of belonging to each category: P(C|X) = (P(X|C) * P(C)) / P(X); Among them, P(C|X) is the posterior probability of sample X under category C, P(X|C) is the likelihood that the sample under category C is a positive example, and P(C) is the prior probability of category C , P(X) is the marginal probability of sample X.
- Select the optimal category: According to the posterior probability of all samples, select the category with the highest probability as the final prediction result.
Note that Naive Bayes classifiers assume that features are independent of each other. In practical applications, this assumption may not always hold, so the model needs to be adjusted to fit the specific data distribution. Also, Naive Bayes classifiers may not perform well with missing values and noisy data.
Schematic diagram of the structure of the naive Bayesian classifier
Naive Bayes algorithm features
advantage
The logic is simple, easy to implement, and the time and space overhead in the classification process is relatively small;
The algorithm is relatively stable and has relatively good robustness
shortcoming
There is an assumption of conditional independence between attributes, but in many practical problems, this independence assumption does not hold, resulting in a decline in the classification effect.
3. Bayesian Belief Nets
Bayesian belief network, or Bayesian network for short, graphically represents the probability relationship between a set of random variables
Bayesian networks have two main components:
- A directed acyclic graph representing dependencies between variables
- A conditional probability table that associates each node with its immediate parent
In Bayesian Belief Network, each node is also associated with a probability table. If the node X has no parent node, the table only contains the prior probability P(X), if the node X has only one parent node Y, the table contains the conditional probability P(X|Y), if the node X There are multiple parent nodes { Y 1, Y 2,…, Y k } , the table contains the conditional probability P (X| Y 1, Y 2,…, Y k )
The modeling of Bayesian networks consists of two steps
- Create network structure
- Estimate the probability value of each node in the probability table
Bayesian Belief Network Features
- BBN provides a way to capture domain-specific prior knowledge with graphical models;
- The network structure is determined, and it is very easy to add new variables;
- Bayesian networks are well suited for dealing with incomplete data;
- It is very robust to the problem of overfitting of the model.