Article directory
1. Total probability formula
1. References
p(alive) = 0.5 * 0.8 + 0.5 * 0.3
The event that the flower is alive can be divided into two cases, one is if the neighbor remembers to water, the flower is alive, and the other is if the neighbor forgets to
water Next, the flower is alive.
2. Total probability formula
2. Bayesian formula
1. References
p(neighbor remembers watering | flower alive) = (p(flower lives | neighbor remembers watering) * p(neighbor remembers watering)) / (p(flower lives | neighbor remembers watering) * p(neighbor remembers watering ) + p(flower alive | neighbor forgot to water)*p(neighbor forgot to water)) = (0.8 * 0.5) / (0.8 * 0.5 + 0.3 * 0.5)
2. Bayesian formula
3. Naive Bayes
1. Concept
The Bayesian algorithm is based on Bayesian theorem and is supported by rigorous mathematical theory. When it is assumed that each sample is independent of each other, the constructed Bayesian algorithm is called the Naive Bayesian algorithm.
2. Algorithm process
(1) Let the data set be D, where each tuple has n attributes, and one of the tuples is X = {x1, x2, ..., xn} (2) Suppose there are a total of m categories {C1, C2, ...
, Cm}, given a tuple X, calculate the probability of which class X belongs to, that is, when P(Ci|X) is the largest, X belongs to the Ci class with the highest probability.
According to the Bayesian formula, P(Ci | X) = (P(X | Ci) * P(Ci)) / P(X), where P(X) can be obtained by the total probability formula.
(3) For each type of Ci, P(X) is the same, just make the molecule the largest, P(X | Ci) * P(Ci) is the largest, where P(Ci) = |Ci| / |D| (ie is the number of tuples in category Ci compared to the total number of tuples in the dataset).
(4) For P(X | Ci), it is assumed that each attribute is independent of each other,
so P(X | Ci) = P(X1 | Ci) * P(X2 | Ci) ...P(Xn | Ci)
If the attribute is a discrete attribute, then where P(X1|Ci) = the first attribute in Ci is equal to the number of tuples in X1 to the number of tuples in Ci in D.
If the attribute is a continuous attribute, then
3. Laplace Calibration
If a certain P(Xj|Ci) = 0 is calculated, it will make the entire P(X|Ci) equal to 0, so when the data sample is too small, if the P(Xj|Ci) of an attribute is equal to 0, it will be offset In order to avoid the influence of other attributes, Laplace smoothing is used.
Among them:
where K represents the number of categories, Sj represents the number of values of the jth feature, generally 1.
4. An example
So for X1, X2 are suitable for outdoor sports.