Machine Learning Algorithms - Naive Bayes summary

First, the principle of mathematical algorithms Naive Bayes

Naive Bayes algorithm is to generate learning model supervised learning, simple, no iteration, and have a solid mathematical theory (ie Bayes theorem) as support.

(1) Algorithm idea: by conditional probability distribution and the distribution of the training data for learning from the prior joint probability distribution probability learning, and based on independent characteristic condition and Bayes theorem obtained posterior probability of the input x assigned to the maximum posterior probability category, the use of the principle of learning model based on expected risk under 0,1 minimized the loss function.

(2) features two

Simple: assume independent of each other characteristics; (a strong assumption that there will be better performance at a large number of samples, not the conditions for the input feature vectors are unlikely to be associated with the scene set up in the real world, but it is. can greatly simplify the calculation, but studies have shown little effect on the classification accuracy of the results)
Bayesian: based on Bayes' theorem.

Bayesian formula is derived as follows:
Bayesian formula derivation
Bayesian classifier
(3) parameter estimation method: maximum likelihood estimation method / Bayesian estimation
using maximum likelihood estimation method to estimate the conditional probability distribution and the prior probability, but the probability that there will be estimated value of 0, so the use of Bayesian estimation , the idea is to add a positive number in the respective values of a random variable, when a = 0, the maximum likelihood estimation method is the

In particular, if when a = 1, is called Laplacian smoothing (parameter Bayesian estimation 1) , that is, the denominator of the prior probability category plus the number of molecules (the divided count) plus 1; conditional 1 plus the probability molecule, plus the number of possible values of the denominator corresponding features. So while addressing the problem of zero probability, and also to ensure that the probability is still 1.

: (4) application
of text classification (Internet News classification)
filter junk e-mail

(5) the advantages and disadvantages:
Advantages:
arithmetic logic is simple, easy to implement (algorithm idea is as simple as using a Bayesian formula can be transformed!)
Smaller when the data is still valid, can be categorized as multi-class
small classification process in time and space overhead ( assume characteristics independent of each other, will only involve two-dimensional memory)
Cons:
naive Bayes assumptions are independent properties, this assumption is often in the actual process is not established. The greater the correlation between the attributes, the greater the classification error.

(6) Summary of
Bayesian probability and Bayes criterion provides an efficient method of utilizing a known value to estimate the unknown probability.
Underflow need to consider the issue can be resolved by taking a number of methods,
for document classification problem, a bag of words model is more effective than word set model, you can remove stop words

Second, naive Bayes code implementation

Divided into two implementation
sklearn (call the library)
python3 (machine learning practical source code for text classification, spam filtering)
the code on github

sklearn achieve

sklearn There are three different types of Naive Bayes:

  • Gaussian profile GaussianNB : a classification problem, assuming the properties / characteristics of the normal distribution.
  • Polynomial type MultinomialNB : for discrete values in the model. Text classification word count to the number of occurrences as the eigenvalues and non-negative.
  • Bernoulli type BernoulliNB : for discrete features, the resulting feature only 0 (no-show) and 1 (appeared). BernoulliNB is divided by a threshold or binary Boolean properties, in particular, the value of each feature are Boolean.

**

A Gaussian Naive Bayes: sklearn.naive_bayes.GaussianNB (priors = None)

Continuous process for the Gaussian manner is continuous value
**

import numpy as np
from sklearn.naive_bayes import GaussianNB
X = np.array([[-1, -1], [-2, -2], [-3, -3],[-4,-4],[-5,-5], [1, 1], [2,2], [3, 3]])
y = np.array([1, 1, 1, 1, 1, 2, 2, 2])
clf = GaussianNB(priors=[0.625, 0.375])#默认priors=None
clf.fit(X, y, sample_weight=None)#训练样本,X表示特征向量,y类标记,sample_weight表各样本权重数组
print (clf.class_prior_)#priors属性:获取各个类标记对应的先验概率
print (clf.priors)#class_prior_属性:同priors一样,
print (clf.class_count_)#class_count_属性:获取各类标记对应的训练样本数
print (clf.theta_)#theta_属性:获取各个类标记在各个特征上的均值
print (clf.sigma_)#sigma_属性:获取各个类标记在各个特征上的方差
print (clf.get_params(deep=True))#get_params(deep=True):返回priors与其参数值组成字典
clf.set_params(priors=[ 0.6,  0.4])#set_params(**params):设置估计器priors参数
print (clf.get_params(deep=True))
print (clf.predict([[-6,-6],[4,5]]))#预测样本分类
print (clf.predict_proba([[-6,-6],[4,5]]))#predict_proba(X):输出测试样本在各个类标记预测概率值
print (clf.predict_log_proba([[-6,-6],[4,5]]))#predict_log_proba(X):输出测试样本在各个类标记上预测概率值对应对数值
print (clf.score([[-6,-6],[-4,-2],[-3,-4],[4,5]],[1,1,2,2])) #score(X, y, sample_weight=None):返回测试样本映射到指定类标记上的平均得分(准确率)

# output:
# [0.625 0.375]
# [0.625, 0.375]
# [5. 3.]
# [[-3. -3.]
#  [ 2.  2.]]
# [[2.00000001 2.00000001]
#  [0.66666667 0.66666667]]
# {'priors': [0.625, 0.375]}
# {'priors': [0.6, 0.4]}
# [1 2]
# [[1.00000000e+00 3.29099984e-40]
#  [5.13191647e-09 9.99999995e-01]]
# [[ 0.00000000e+00 -9.09122123e+01]
#  [-1.90877867e+01 -5.13191623e-09]]
# 0.75

Second, naive Bayes polynomial: sklearn.naive_bayes.MultinomialNB (alpha = 1.0, fit_prior = True, class_prior = None)

Mainly used for classification of discrete features, such as text classification word count, the number to appear as eigenvalues and non-negative
parameters:

  • alpha: float, optional, default 1.0 (Laplacian smoothing)
  • fit_prior: Boolean, optional, default True, learns a priori probability argument to False means that all labeled with the same kind of a priori probability, not a priori probability learning
  • class_prior: similar array, the array size (n_classes,), default None, prior class probabilities
import numpy as np
from sklearn.naive_bayes import MultinomialNB
X = np.array([[1,2,3,4],[1,3,4,4],[2,4,5,5],[2,5,6,5],[3,4,5,6],[3,5,6,6]])
y = np.array([1,1,4,2,3,3])
clf = MultinomialNB(alpha=1, class_prior=None, fit_prior=False)
clf.fit(X, y, sample_weight=None)#训练样本,X表示特征向量,y类标记,sample_weight表各样本权重数组
print(clf.class_log_prior_)
#class_log_prior_:各类标记的平滑先验概率对数值,其取值会受fit_prior和class_prior参数的影响,三种情况
#若指定了class_prior参数,不管fit_prior为True或False,class_log_prior_取值是class_prior转换成log后的结果
#若fit_prior参数为False,class_prior=None,则各类标记的先验概率相同等于类标记总个数N分之一
#若fit_prior参数为True,class_prior=None,则各类标记的先验概率相同等于各类标记个数除以各类标记个数之和
print (clf.class_count_)#class_count_属性:获取各类标记对应的训练样本数
print (clf.feature_count_)#:各类别各个特征出现的次数,返回形状为(n_classes, n_features)数组)
print (clf.get_params(deep=True))#get_params(deep=True):返回priors与其参数值组成字典
print (clf.predict_log_proba([[3,4,5,4],[1,3,5,6]]))#predict_log_proba(X):输出测试样本在各个类标记上预测概率值对应对数值
print (clf.predict_proba([[3,4,5,4],[1,3,5,6]]))#predict_proba(X):输出测试样本在各个类标记预测概率值
print (clf.score([[3,4,5,4],[1,3,5,6]],[1,1]))#score(X, y, sample_weight=None):输出对测试样本的预测准确率的平均值
clf.set_params(alpha=2.0)#set_params(**params):设置估计器参数
print (clf.get_params(deep=True))

# output:
# [-1.38629436 -1.38629436 -1.38629436 -1.38629436]
# [2. 1. 2. 1.]
# [[ 2.  5.  7.  8.]
#  [ 2.  5.  6.  5.]
#  [ 6.  9. 11. 12.]
#  [ 2.  4.  5.  5.]]
# {'fit_prior': False, 'class_prior': None, 'alpha': 1}
# [[-1.70084964 -1.31750168 -1.29059819 -1.29257843]
#  [-1.00382273 -1.59845908 -1.58396998 -1.48652445]]
# [[0.18252837 0.26780353 0.27510617 0.27456193]
#  [0.36647582 0.20220787 0.205159   0.22615731]]
# 0.5
# {'fit_prior': False, 'class_prior': None, 'alpha': 2.0}

Third, Bernoulli Naive Bayes: sklearn.naive_bayes.BernoulliNB (alpha = 1.0, binarize = 0.0, fit_prior = True, class_prior = None)

Similar polynomial naive Bayes, mainly user features discrete classification
differs MultinomialNB are:
the number of MultinomialNB to appear as a characteristic value, BernoulliNB binary or Boolean properties, in particular, is a Boolean value of each feature type, i.e., true and false, or 1, and 0. In text classification, there is a feature does not appear in a document.
Parameters:
binarize: the characteristic data of the binarized threshold value is greater than this threshold value is 1, less than the threshold value is 0, (to more than one parameter, the other parameters the same polynomial Naive Bayes)

Guess you like

Origin blog.csdn.net/qq_39751437/article/details/86521044