Data Mining (5.1)--Bayesian Classification

Table of contents

foreword

text

1. Subjective probability

2. Bayes Theorem

1. Basic knowledge

2. Bayesian Decision Criterion

3. Maximum Posterior Hypothesis

4. Examples

2. Naive Bayesian classification model

Algorithmic description of the Naive Bayes classifier:

Naive Bayes algorithm features

3. Bayesian Belief Nets

The modeling of Bayesian networks consists of two steps 

Bayesian Belief Network Features


5f47eb409cca4ac9b7f50d3013086fcf.png
summer train

foreword

Bayesian classification method is a statistical classification method, which uses the knowledge of probability and statistics to predict the probability that a given tuple belongs to a specific class. Bayesian classification is based on Bayes' theorem. The simplest Bayesian classification algorithm is called Naive Bayesian classification. .

text

1. Subjective probability

Bayesian method is a reasoning method for studying uncertainty. Uncertainty is often expressed by Bayesian probability. It is a subjective probability, an individual subjective estimate, and changes with the change of individual subjective cognition.

Its estimation depends on the correctness of prior knowledge and the richness and accuracy of posterior knowledge, so Bayesian probability may often change with the information that individuals have. A judgment based on posterior knowledge depends on the mastery of information

2. Bayes Theorem

1. Basic knowledge

Under the condition that event A occurs, the probability of event B occurring is called the conditional probability of event B under the occurrence of event A, denoted as P ( B | A ), where P ( A ) is called prior probability,  P ( B | A ) is called the posterior probability, and the formula for calculating the conditional probability is:

925dd72c46e34f188d8e3817147faf43.png

Then=Likelihood*Prior/Evidence Factor

The conditional probability formula is transformed into a multiplicative formula:

b602e52422a149a38e3b76ffaf803854.png

Let A and B be two random events, if P ( AB ) = P ( A ) P ( B ) is established, then the events A and B are said to be independent of each other. At this time,  P ( A | B ) = P ( A ) is established. 

Examples of Disease Diagnosis

Two diagnostic (examination) conclusions: the patient has cancer (+); or no cancer (-)

Prior knowledge: 0.8% prevalence in all populations

The accuracy rate of diagnosis is 98% for sick patients and 97% for non-disease patients

P(cancer) = 0.008  P(!cancer) = 0.992

P(+|cancer) = 0.98  P(-|cancer) = 0.02

P(+|!cancer) = 0.03  P(-|!cancer) = 0.97

 Let B 1, B 2,…, B n be mutually exclusive events, P ( B i )>0, i =1, 2,…, n , and eq?%5Cbigcup_%7Bi%3D1%7D%5E%7Bn%7DBi%3D%5COmega, for any event eq?A%5Csubset%20%5Cbigcup_%7Bi%3D1%7D%5E%7Bn%7DBi, the formula for calculating the probability of event A is :

a783f6210419497f861b86e67ec1a867.png

Assuming P ( A ) >0, then under the condition that event A occurs, the probability of event B i occurring is:                             

223fe3d7216e4eacbf77950c6b67ced8.png, which is called the Bayesian formula

2. Bayesian Decision Criterion

If P ( C i | X ) > P ( C j | X ) holds for any ij , then the sample pattern X is judged to be class C i

3. Maximum Posterior Hypothesis

According to the Bayesian formula, a method for calculating the posterior probability can be obtained: under certain assumptions, the posterior probability can be obtained according to the prior probability and the probability obtained by statistical sample data.

Let P ( c ) be the prior probability of hypothesis c , which represents the probability that c is the correct hypothesis, P ( X ) represents the prior probability of the training sample X , and P ( X | c ) represents the condition that the hypothesis c is correct The probability of the occurrence or occurrence of the next sample X , according to the Bayesian formula, the calculation formula of the posterior probability can be obtained:

eb0259ac97a84d8ea671dd02ffb7473f.png

p Let C be the set of categories, that is, the set of hypotheses to be selected. When an unknown category label sample X is given , the most likely hypothesis cC is found by calculation . The hypothesis or category with the greatest possibility is called the maximum a posteriori Suppose (maximum a posteriori), written as:

186c6575516347c9a14d62b6ce968fd8.png

4. Examples

8a91f9e59d34405e8252034af6638232.png

Assuming the weather conditions of the day: X = { Sunny, Hot, High, Weak }, determine whether it is possible to play tennis today?

①Statistics

0c818713dc224f59ae94d62060fcd417.png

 ② Calculate the prior probability

6359ac6a67674402b26c7eb6c1c9adae.png

 ③ Calculate the posterior probability (since we are judging whether to go out to play on the day, so when calculating below, only the upper part is calculated)

71dcfe74d0244f1fbd7998cf3fddd4ec.png

P(Yes | X) = P(Yes)*P(Sunny|Yes)*P(Hot|Yes)*P(High|Yes)P(Weak|Yes)

= 9/14 * 2/9 * 2/9 * 3/9 * 6/9 ≈ 0.0071

P(No | X) = P(No)*P(Sunny|No)*P(Hot|No)*P(High|No)P(Weak|No)

= 5/14 * 3/5 * 2/5 * 4/5 * 2/5 ≈ 0.0026

2. Naive Bayesian classification model

Naive Bayesian classification model: the algorithm logic is simple, the operation speed is fast, the classification time is short, and the accuracy is high

It is based on the assumption of class-conditional independence of attributes, that is, attributes are independent of each other under a given class-state condition

Algorithmic description of the Naive Bayes classifier:

  1. Initialization: Randomly select a category as the "prior probability", usually set to 0.5. This means that during training, we always divide samples into two categories: positive examples (or support examples, positive example) and negative examples (or rejection examples, negative example).
  2. Calculate the posterior probability: For each sample, we need to calculate the posterior probability that it belongs to each class. For a given sample, we can use the Bayesian formula to calculate its posterior probability of belonging to each category: P(C|X) = (P(X|C) * P(C)) / P(X); Among them, P(C|X) is the posterior probability of sample X under category C, P(X|C) is the likelihood that the sample under category C is a positive example, and P(C) is the prior probability of category C , P(X) is the marginal probability of sample X.
  3. Select the optimal category: According to the posterior probability of all samples, select the category with the highest probability as the final prediction result.

Note that Naive Bayes classifiers assume that features are independent of each other. In practical applications, this assumption may not always hold, so the model needs to be adjusted to fit the specific data distribution. Also, Naive Bayes classifiers may not perform well with missing values ​​and noisy data.

66d98f970c1c4194b37d5a77396494a0.png

Schematic diagram of the structure of the naive Bayesian classifier

Naive Bayes algorithm features

advantage

The logic is simple, easy to implement, and the time and space overhead in the classification process is relatively small;

The algorithm is relatively stable and has relatively good robustness

shortcoming

There is an assumption of conditional independence between attributes, but in many practical problems, this independence assumption does not hold, resulting in a decline in the classification effect.

3. Bayesian Belief Nets

Bayesian belief network, or Bayesian network for short, graphically represents the probability relationship between a set of random variables

Bayesian networks have two main components:

  • A directed acyclic graph representing dependencies between variables
  • A conditional probability table that associates each node with its immediate parent 

In Bayesian Belief Network, each node is also associated with a probability table. If the node X has no parent node, the table only contains the prior probability P(X), if the node X has only one parent node Y, the table contains the conditional probability P(X|Y), if the node X There are multiple parent nodes { Y 1, Y 2,…, Y k } , the table contains the conditional probability  P (X| Y 1, Y 2,…, Y k )                                   

The modeling of Bayesian networks consists of two steps 

  • Create network structure
  • Estimate the probability value of each node in the probability table

Bayesian Belief Network Features

  • BBN provides a way to capture domain-specific prior knowledge with graphical models;
  • The network structure is determined, and it is very easy to add new variables;
  • Bayesian networks are well suited for dealing with incomplete data;
  • It is very robust to the problem of overfitting of the model.

Guess you like

Origin blog.csdn.net/weixin_53197693/article/details/130886142