Data Mining Must Classify Algorithms: Naive Bayes Algorithm

Take you to understand the naive Bayes classification algorithm


Take you to understand the naive Bayes classification algorithm

Bayesian classification is a general term for a class of classification algorithms . These algorithms are based on Bayes' theorem, so they are collectively referred to as Bayesian classification. Naive and naive Bayesian classification is the simplest and most common classification method in Bayesian classification . In this article, I will try to summarize the Naive Bayes classification algorithm mentioned in our study meeting as straightforward as possible, hoping to help others understand.


1   Overview of Classification Problems


 In fact, no one is unfamiliar with the classification problem. In our daily life, we carry out the classification process every day. For example, when you see a person, your mind subconsciously judges whether he is a student or a member of society; you may often walk on the road and say something like "this person is rich at first glance" to your friends. In fact, this is a classification operation.


Since it is a Bayesian classification algorithm, what is the mathematical description of the classification?


From a mathematical point of view, the classification problem can be defined as follows: the set sum is known , and the mapping rule y = f(x) is determined, so that there is any and only one , such that it is true .


Where C is called a category set, where each element is a category, and I is called an item set ( feature set ), where each element is an item to be classified, and f is called a classifier. The task of a classification algorithm is to construct a classifier f.


The content of classification algorithms is that given features, let us derive categories, which is the key to all classification problems . So how to get our final category from the specified features is also what we will talk about below. Each different classification algorithm corresponds to a different core idea.


In this article, I will use a specific example to explain almost all the important knowledge points of the Naive Bayes algorithm.


2   Naive Bayes Classification


So since it is a naive Bayes classification algorithm , what is its core algorithm?

is the following Bayesian formula:




It will be much clearer to change the expression form, as follows:




We finally ask for p (category | feature)! It's like completing our mission.


3   case analysis


Below I give an example problem.


The given data is as follows:




The question for us now is, if a pair of boyfriend and girlfriend, the boy wants to propose to a girl, the four characteristics of the boy are not handsome, bad personality, short height, not motivated, please judge whether the girl should marry or not ?


This is a typical classification problem. Turning it into a mathematical problem is to compare p(marry|(not handsome, bad personality, short height, not motivated)) and p(not marry|(not handsome, bad personality, short height, The probability of not being motivated )) , whoever has a higher probability, I can give the answer of marrying or not marrying!

Here we are linked to the Naive Bayes formula:




We need to find p(marriage|(not handsome, bad personality, short height, not motivated), which we don't know, but it can be converted into three quantities that are easy to find through the Naive Bayes formula.


p (not handsome, bad personality, short height, not motivated | marry), p (not handsome, bad personality, short height, not motivated), p (marry) ( as for why you can ask, I will talk about it later, then just Great, transforming the quantity to be sought into other values ​​that can be found is equivalent to solving our problem! )


4   Explanation of the term naive for the Naive Bayes algorithm


So how are these three quantities obtained?


It is obtained from the statistics of known training data. The solution process of this example is given in detail below.

Recall that the formula we require is as follows:




Then I only ask for p (not handsome, bad personality, short height, not motivated | marry), p (not handsome, bad personality, short height, not motivated), p (marriage), okay, below I find these probabilities separately, and in the last comparison, I get the final result.


p(not handsome, bad personality, short height, not motivated|married) = p(not handsome|married)*p(bad personality|married)*p(short height|married)*p(not motivated|married) , then I have to count the following probabilities separately, and get the probability on the left!


Wait, why is this established? Students who have studied probability theory may feel that the conditions for the establishment of this equation require that features be independent of each other!


correct! This is why the naive Bayes classification has the origin of the word naive. The naive Bayes algorithm assumes that each feature is independent of each other, then this equation is established!


But why do you need to assume that features are independent of each other?



1. We think this way, if there is no such assumption, then our estimation of these probabilities on the right side is actually impossible. In this way, our example has 4 characteristics, among which handsome includes {handsome, not handsome}, and personality includes {not handsome}. Good, good, bursting good}, height includes {high, short, medium}, and progressive includes {not progressive, progressive}, then the joint probability distribution of the four features is a total of 4-dimensional space, and the total number is 2*3*3 *2=36.


36, the computer scan statistics are OK, but in real life, there are often many features, and each feature has a lot of values, so it becomes almost impossible to estimate the value of the following probability through statistics, which is also Reasons why you need to assume independence between features.


2. If we do not assume that the features are independent of each other, then when we make statistics, we need to find them in the entire feature space, such as statistical p (not handsome, bad personality, short height, not motivated | married),


We need to find the number of people who are not handsome, bad personality, short height, and not motivated under the condition of marriage. In this case, due to the sparsity of the data, it is easy to count to 0. Case. This is inappropriate.


According to the above two reasons, the Naive Bayes method makes the assumption of conditional independence for the conditional probability distribution. Since this is a strong assumption, Naive Bayes is also named after it! This assumption makes Naive Bayes simple, but sometimes sacrifices certain classification accuracy.


Ok, above I explained why it can be split into a split-multiply form. So let's get started!


We organize the above formula as follows:




下面我将一个一个的进行统计计算(在数据量很大的时候,根据中心极限定理,频率是等于概率的,这里只是一个例子,所以我就进行统计即可)。


p(嫁)=?

首先我们整理训练数据中,嫁的样本数如下:



则 p(嫁) = 6/12(总样本数) = 1/2


p(不帅|嫁)=?统计满足样本数如下:



则p(不帅|嫁) = 3/6 = 1/2 在嫁的条件下,看不帅有多少


p(性格不好|嫁)= ?统计满足样本数如下:



则p(性格不好|嫁)= 1/6


p(矮|嫁) = ?统计满足样本数如下:



则p(矮|嫁) = 1/6


p(不上进|嫁) = ?统计满足样本数如下:



则p(不上进|嫁) = 1/6


下面开始求分母,p(不帅),p(性格不好),p(矮),p(不上进)

统计样本如下:




不帅统计如上红色所示,占4个,那么p(不帅) = 4/12 = 1/3




性格不好统计如上红色所示,占4个,那么p(性格不好) = 4/12 = 1/3




身高矮统计如上红色所示,占7个,那么p(身高矮) = 7/12




不上进统计如上红色所示,占4个,那么p(不上进) = 4/12 = 1/3


到这里,要求p(不帅、性格不好、身高矮、不上进|嫁)的所需项全部求出来了,下面我带入进去即可,



= (1/2*1/6*1/6*1/6*1/2)/(1/3*1/3*7/12*1/3)


下面我们根据同样的方法来求p(不嫁|不帅,性格不好,身高矮,不上进),完全一样的做法,为了方便理解,我这里也走一遍帮助理解。首先公式如下:




下面我也一个一个来进行统计计算,这里与上面公式中,分母是一样的,于是我们分母不需要重新统计计算!


p(不嫁)=?根据统计计算如下(红色为满足条件):




则p(不嫁)=6/12 = 1/2


p(不帅|不嫁) = ?统计满足条件的样本如下(红色为满足条件):




则p(不帅|不嫁) = 1/6


p(性格不好|不嫁) = ?据统计计算如下(红色为满足条件):



则p(性格不好|不嫁) =3/6 = 1/2


p(矮|不嫁) = ?据统计计算如下(红色为满足条件):



则p(矮|不嫁) = 6/6 = 1


p(不上进|不嫁) = ?据统计计算如下(红色为满足条件):


则p(不上进|不嫁) = 3/6 = 1/2


那么根据公式:


p (不嫁|不帅、性格不好、身高矮、不上进) = ((1/6*1/2*1*1/2)*1/2)/(1/3*1/3*7/12*1/3)

很显然(1/6*1/2*1*1/2) > (1/2*1/6*1/6*1/6*1/2)


于是有p (不嫁|不帅、性格不好、身高矮、不上进)>p (嫁|不帅、性格不好、身高矮、不上进)


所以我们根据朴素贝叶斯算法可以给这个女生答案,是不嫁!!!!


5   朴素贝叶斯分类的优缺点


优点:

(1) 算法逻辑简单,易于实现(算法思路很简单,只要使用贝叶斯公式转化医学即可!

(2)分类过程中时空开销小(假设特征相互独立,只会涉及到二维存储


缺点:


理论上,朴素贝叶斯模型与其他分类方法相比具有最小的误差率。但是实际上并非总是如此,这是因为朴素贝叶斯模型假设属性之间相互独立,这个假设在实际应用中往往是不成立的,在属性个数比较多或者属性之间相关性较大时,分类效果不好。


When the attribute correlation is small, Naive Bayes performs best. For this, there are algorithms such as semi-naive Bayes that improve modestly by taking into account partial associations.


The whole example explains the classification process of the Naive Bayes algorithm in detail, and I hope it will be helpful to everyone's understanding~


Reference: Dr. Li Hang, "Statistical Learning Methods"

Algorithmic grocery store--Naive Bayesian classification of classification algorithms

Acknowledgments: Tokugawa, Haoyu, Jihao, Shi Qi


Original Address: https://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw==&mid=2247483819&idx=1&sn=7f1859c0a00248a4c658fa65f846f341&chksm=ebb4397fdcc3b06933816770b928355eb9119c4c80a1148b92a42dc3c08de5098fd6f278e61e#rd

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325522498&siteId=291194637