Naive Bayes (for interviews, core theory, lite)

Naive Bayesian method (used for interviews, core theory, simplification)




Naive Bayesian method is based on Bayesian method, adding a strong assumption, this strong assumption is the assumption of feature conditional independence (the text is more difficult to understand , we will use the formula to express it later).




Next, we will start to explain the naive Bayes method in a simple way. Before explaining the naive Bayes method, let's first understand a problem, that is, what is this method based on and what is determined, thus solving a sample instance Which category does it fall into.

The naive Bayes method is to obtain the posterior probability from the prior probability distribution of the category and the conditional probability distribution of the attribute. The category with the largest posterior probability is the category corresponding to the current sample instance (the above formula is easier to understand).




The final value of the Naive Bayes method is to obtain which class the sample instance belongs to by maximizing the posterior probability. That is, the category with the largest posterior probability is taken as the category of the sample instance.


Therefore, the goal of the problem is as follows:


we know that the naive Bayes method, like other machine learning methods, needs to find the parameters, so that the characteristics of the sample instance can be input into it, and the category it belongs to, so let's see See this question about parameters. Parameter
estimation by Naive Bayes There are two methods for parameter estimation, one is maximum likelihood estimation, where likelihood refers to conditional probability (discussed in detail later) ; the other is Bayesian estimation. We explain separately. Maximum Likelihood Estimation The maximum likelihood estimation for the prior probability is as follows: It is to count the proportion of each category in all samples (the formula is as follows, clearer) The maximum likelihood estimation of the conditional probability is as follows: that is, the statistics are different The proportion of the attribute values ​​of y in different categories (the formula is as follows, which is clearer), where j represents the Jth sample, i represents the attribute, and L represents the Lth value of the ith attribute. Let us understand it fully through an example. There are training data sets as follows: Find what category x (2, s) belongs to
















(insert dataset table)


Maximum Likelihood Estimation:

result:



Bayesian parameter estimation
In order to avoid using maximum likelihood estimation, the probability value to be estimated may be 0.
Bayesian estimated prior probability: (where lamda=1, it is Laplace smoothing. Generally, lamda=1, K is the number of categories, and N is the number of samples)


Bayesian estimated conditional probability: (where Sj is the attribute The number of values ​​of j)


Similarly, we still use Bayesian estimation to recalculate based on the last training set.


Both estimates are the same.

The above is the core theory of Bayesian. My hobbies and research directions are deep learning, machine learning, and computer vision. Welcome to visit and communicate with each other. Personal WeChat


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324521440&siteId=291194637