1. Bayesian principle
Naive Bayes official website:
https://scikit-learn.org/stable/modules/naive_bayes.html
Bayesian classification is a classification algorithm based on Bayesian theorem. Its main idea is: prior probability + new data = posterior probability
Knowing a certain conditional probability, how to obtain the probability after event exchange; that is, in Find P(A|B) when P(B|A) is known. The conditional probability P(B|A) represents the probability that event B will occur given that event A has already occurred. The basic solution formula is: P(B|A)=P(AB)/P(A).
Bayes theorem:
For example:
There are three modes of transportation to go to work from home: taxi, subway and bus. The corresponding probabilities are P(A1)=0.5, P(A2)=0.3, P(A3)=0.2. Given each method The probabilities of being late for work are: being late by taxi: P(B|A1)=0.2, being late by subway: P(B|A2)=0.4, being late by bus P(B|A3)=0.7, solve, if you are late for work , is the probability of taking a taxi, that is, solve P(A1|B).
in:
- P(A1), P(A2), P(A3) are the prior probability
- Add a new condition, that is, late for work P(B)
- P(A1|B) is the posterior probability
Total probability formula:
Random events A1, A2,...An constitute a complete event (mutually exclusive, and at least one occurs), random event B occurs with the occurrence of the complete event
Therefore:
Probability of being late
Probability of being late and taking a taxi to work
2. Bayesian Classifier
Naive Bayes methods are a set of supervised learning algorithms based on Bayes' theorem, which "simply" assumes that each pair of features is independent of each other. Given a class y and an associated feature vector from x_1 to x_n, Bayes' theorem states the following relationship:
Use a naive assumption - each pair of features is independent of each other:
For all :i holds true, this relation can be simplified as
Since P(x_1, \dots, x_n) is a constant in the given input, we use the following classification rules:
We can use Maximum A Posteriori (MAP) to estimate P(y) and P(xi | y); the former is the relative frequency of class y in the training set.
The various Naive Bayesian classifiers differ largely from the assumptions they make when dealing with the P(xi | y) distribution.
Despite its simplistic assumptions, Naive Bayes works well in many practical situations, notably document classification and spam filtering. These works all require a small training set to estimate the necessary parameters.
Naive Bayesian learners and classifiers are very fast compared to other more complex methods. The decoupling of the categorical conditional distribution means that each feature can be estimated independently as a one-dimensional distribution. This in turn helps alleviate the problems caused by the curse of dimensionality.
On the other hand, although Naive Bayes is considered to be a pretty good classifier, it is not a good estimator, so the probabilities output from predict_proba cannot be overemphasized.
Naive Bayesian classifier is a kind of supervised learning. There are five common algorithms. These five algorithms are suitable for application in different data scenarios. We should choose different algorithms according to different feature variables. The following are some general differences and presentation.
2.1. Gaussian Naive Bayes
GaussianNB implements the Gaussian Naive Bayesian algorithm for classification. The possibility (ie probability) of features is assumed to be a Gaussian distribution, which is suitable for continuous variables.
2.2. Multinomial Naive Bayes
MultinomialNB implements the Naive Bayes algorithm for multinomial distribution data, suitable for discrete variables. When calculating the prior probability and conditional probability, the smoothed maximum likelihood estimation method is used to estimate, and it is designed for features that do not appear in the learning samples to prevent 0 probability output in future calculations.
2.3. Supplementary Naive Bayes
ComplementNB implements the Complementary Naive Bayes (CNB) algorithm. CNB is a modification of the standard Multinomial Naive Bayes (MNB) algorithm for imbalanced datasets. Specifically, CNB uses statistics from the complement of each class to compute the model's weights. Studies by the inventors of CNB have shown that the parameter estimates of CNB are more stable than those of MNB. Furthermore, CNBs generally perform better (often by a considerable margin) than MNBs on text classification tasks.
2.4. Bernoulli Naive Bayes
BernoulliNB is implemented for multiple Bernoulli distribution data, requiring samples to be represented by binary valued feature vectors. If the sample contains other types of data, a BernoulliNB instance will binarize it (depending on the binarize parameter).
The decision rule of Bernoulli Naive Bayes is based on:
Unlike the rules of multinomial Naive Bayes, where Bernoulli Naive Bayes explicitly penalizes the absence of feature i as a predictor in class y, multinomial Naive Bayes simply ignores the missing features .
2.5. Naive Bayesian model fitting based on external memory
Naive Bayesian models can solve large-scale classification problems where the entire training set cannot be loaded into memory. In order to solve this problem, MultinomialNB, BernoulliNB, and GaussianNB implement the partial_fit method, which can dynamically increase the data. The usage method is the same as that of other classifiers. See Out-of-core classification of text documents for usage examples. All Naive Bayes classifiers support sample weights.
Unlike the fit method, the first call to the partial_fit method is passed a list of all desired class labels.
3. Code example
In this example, the Gaussian Naive Bayesian classifier is used to classify the iris dataset. The specific code is as follows:
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn import datasets
class bayes_model():
def __int__(self):
pass
def load_data(self):
data = datasets.load_iris()
iris_target = data.target
iris_features = pd.DataFrame(data=data.data, columns=data.feature_names)
train_x, test_x, train_y, test_y = train_test_split(iris_features, iris_target, test_size=0.3, random_state=123)
return train_x, test_x, train_y, test_y
def train_model(self, train_x, train_y):
clf = GaussianNB()
clf.fit(train_x, train_y)
return clf
def proba_data(self, clf, test_x, test_y):
y_predict = clf.predict(test_x)
y_proba = clf.predict_proba(test_x)
accuracy = metrics.accuracy_score(test_y, y_predict) * 100
tot1 = pd.DataFrame([test_y, y_predict]).T
tot2 = pd.DataFrame(y_proba).applymap(lambda x: '%.2f' % x)
tot = pd.merge(tot1, tot2, left_index=True, right_index=True)
tot.columns=['y_true', 'y_predict', 'predict_0', 'predict_1', 'predict_2']
print('The accuracy of Testset is: %d%%' % (accuracy))
print('The result of predict is: \n', tot.head())
return accuracy, tot
def exc_p(self):
train_x, test_x, train_y, test_y = self.load_data()
clf = self.train_model(train_x, train_y)
res = self.proba_data(clf, test_x, test_y)
return res
if __name__ == '__main__':
bayes_model().exc_p()
Screenshot of some results: