Machine Learning Algorithm (3) Classification Method Based on Probability Theory: Naive Bayes

Look at these two
Links: Graphical Machine Learning | Detailed Naive Bayesian Algorithm
Link: Take you to understand the Naive Bayesian Classification Algorithm
Link: Understand Laplacian Smoothing for Naive Bayesian Classification

introduction

Among the many machine learning classification algorithms, the Naive Bayesian model we mentioned in this article is different from most other classification algorithms, and it is also one of the very important models.

In machine learning, models such as KNN, logistic regression, and decision trees are all discriminative methods, that is, directly learn the relationship between feature output and features (decision function or conditional distribution). However, Naive Bayes is a generative method, which directly finds the joint distribution of feature output and features, and then insert image description here
obtains the result judgment through calculation.
It will be much clearer to change the expression form, as follows:insert image description here

Naive Bayesian is a very intuitive model that has been widely used in many fields, such as early text classification, which was often used as a baseline model. In this article, we will introduce the principle of Naive Bayesian algorithm.

1. The core idea of ​​Naive Bayes algorithm

Relatively speaking, other classification algorithms mainly require a specific classification, while the Naive Bayesian algorithm seeks probability. For example, given a photo, determine what kind of animal it is. If you use KNN or a decision tree, you will conclude that it is a puppy, and if you use Naive Bayes, the probability of hitting a puppy is 80%.

2. Laplace smoothing and basis

In order to solve the problem of zero probability, the French mathematician Laplace first proposed the method of adding 1 to estimate the probability of phenomena that have not occurred before, so additive smoothing is also called Laplace smoothing.
Assuming that the training sample is large, the estimated probability change caused by adding 1 to the count of each component x is negligible, but it can conveniently and effectively avoid the zero probability problem.
In the scenario corresponding to text classification, if multinomial Naive Bayes is used, it is assumed that the feature x represents the number of times a certain word appears in the sample (of course, it can also be represented by TF-IDF). The conditional probability calculation formula after Laplace smoothing is:insert image description here

3. Advantages and disadvantages

Advantages: It is still effective in the case of less data and can handle multi-category problems.
Cons: Sensitive to how the input data is prepared.
Applicable data type: Nominal data.

Guess you like

Origin blog.csdn.net/qq_28976599/article/details/131176352