Data Classification Algorithms - Naive Bayes


1. Total probability formula

1. References

insert image description here
p(alive) = 0.5 * 0.8 + 0.5 * 0.3
The event that the flower is alive can be divided into two cases, one is if the neighbor remembers to water, the flower is alive, and the other is if the neighbor forgets to
water Next, the flower is alive.

2. Total probability formula

insert image description here

2. Bayesian formula

1. References

insert image description here
p(neighbor remembers watering | flower alive) = (p(flower lives | neighbor remembers watering) * p(neighbor remembers watering)) / (p(flower lives | neighbor remembers watering) * p(neighbor remembers watering ) + p(flower alive | neighbor forgot to water)*p(neighbor forgot to water)) = (0.8 * 0.5) / (0.8 * 0.5 + 0.3 * 0.5)

2. Bayesian formula

insert image description here

3. Naive Bayes

1. Concept

The Bayesian algorithm is based on Bayesian theorem and is supported by rigorous mathematical theory. When it is assumed that each sample is independent of each other, the constructed Bayesian algorithm is called the Naive Bayesian algorithm.

2. Algorithm process

(1) Let the data set be D, where each tuple has n attributes, and one of the tuples is X = {x1, x2, ..., xn} (2) Suppose there are a total of m categories {C1, C2, ...
, Cm}, given a tuple X, calculate the probability of which class X belongs to, that is, when P(Ci|X) is the largest, X belongs to the Ci class with the highest probability.
According to the Bayesian formula, P(Ci | X) = (P(X | Ci) * P(Ci)) / P(X), where P(X) can be obtained by the total probability formula.
(3) For each type of Ci, P(X) is the same, just make the molecule the largest, P(X | Ci) * P(Ci) is the largest, where P(Ci) = |Ci| / |D| (ie is the number of tuples in category Ci compared to the total number of tuples in the dataset).
(4) For P(X | Ci), it is assumed that each attribute is independent of each other,
so P(X | Ci) = P(X1 | Ci) * P(X2 | Ci) ...P(Xn | Ci)

If the attribute is a discrete attribute, then where P(X1|Ci) = the first attribute in Ci is equal to the number of tuples in X1 to the number of tuples in Ci in D.
If the attribute is a continuous attribute, then
insert image description here

3. Laplace Calibration

If a certain P(Xj|Ci) = 0 is calculated, it will make the entire P(X|Ci) equal to 0, so when the data sample is too small, if the P(Xj|Ci) of an attribute is equal to 0, it will be offset In order to avoid the influence of other attributes, Laplace smoothing is used.
Among them:
insert image description here
insert image description here
where K represents the number of categories, Sj represents the number of values ​​of the jth feature, insert image description heregenerally 1.


4. An example

insert image description here
insert image description here
insert image description here
So for X1, X2 are suitable for outdoor sports.

Guess you like

Origin blog.csdn.net/weixin_47250738/article/details/125459026