Machine Learning - Bayesian Classifier (with Python code)

1. Bayesian principle

Naive Bayes official website:
https://scikit-learn.org/stable/modules/naive_bayes.html

Bayesian classification is a classification algorithm based on Bayesian theorem. Its main idea is: prior probability + new data = posterior probability
Knowing a certain conditional probability, how to obtain the probability after event exchange; that is, in Find P(A|B) when P(B|A) is known. The conditional probability P(B|A) represents the probability that event B will occur given that event A has already occurred. The basic solution formula is: P(B|A)=P(AB)/P(A).
Bayes theorem:
insert image description here

For example:
There are three modes of transportation to go to work from home: taxi, subway and bus. The corresponding probabilities are P(A1)=0.5, P(A2)=0.3, P(A3)=0.2. Given each method The probabilities of being late for work are: being late by taxi: P(B|A1)=0.2, being late by subway: P(B|A2)=0.4, being late by bus P(B|A3)=0.7, solve, if you are late for work , is the probability of taking a taxi, that is, solve P(A1|B).

insert image description here

in:

  • P(A1), P(A2), P(A3) are the prior probability
  • Add a new condition, that is, late for work P(B)
  • P(A1|B) is the posterior probability

Total probability formula:

insert image description here

Random events A1, A2,...An constitute a complete event (mutually exclusive, and at least one occurs), random event B occurs with the occurrence of the complete event
insert image description here

Therefore:
Probability of being late
insert image description here

Probability of being late and taking a taxi to work
insert image description here

2. Bayesian Classifier

Naive Bayes methods are a set of supervised learning algorithms based on Bayes' theorem, which "simply" assumes that each pair of features is independent of each other. Given a class y and an associated feature vector from x_1 to x_n, Bayes' theorem states the following relationship:
insert image description here

Use a naive assumption - each pair of features is independent of each other:

P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y) ,

For all :i holds true, this relation can be simplified as

insert image description here

Since P(x_1, \dots, x_n) is a constant in the given input, we use the following classification rules:

insert image description here

We can use Maximum A Posteriori (MAP) to estimate P(y) and P(xi | y); the former is the relative frequency of class y in the training set.

The various Naive Bayesian classifiers differ largely from the assumptions they make when dealing with the P(xi | y) distribution.

Despite its simplistic assumptions, Naive Bayes works well in many practical situations, notably document classification and spam filtering. These works all require a small training set to estimate the necessary parameters.

Naive Bayesian learners and classifiers are very fast compared to other more complex methods. The decoupling of the categorical conditional distribution means that each feature can be estimated independently as a one-dimensional distribution. This in turn helps alleviate the problems caused by the curse of dimensionality.

On the other hand, although Naive Bayes is considered to be a pretty good classifier, it is not a good estimator, so the probabilities output from predict_proba cannot be overemphasized.

Naive Bayesian classifier is a kind of supervised learning. There are five common algorithms. These five algorithms are suitable for application in different data scenarios. We should choose different algorithms according to different feature variables. The following are some general differences and presentation.

2.1. Gaussian Naive Bayes

GaussianNB implements the Gaussian Naive Bayesian algorithm for classification. The possibility (ie probability) of features is assumed to be a Gaussian distribution, which is suitable for continuous variables.

insert image description here

2.2. Multinomial Naive Bayes

MultinomialNB implements the Naive Bayes algorithm for multinomial distribution data, suitable for discrete variables. When calculating the prior probability and conditional probability, the smoothed maximum likelihood estimation method is used to estimate, and it is designed for features that do not appear in the learning samples to prevent 0 probability output in future calculations.

2.3. Supplementary Naive Bayes

ComplementNB implements the Complementary Naive Bayes (CNB) algorithm. CNB is a modification of the standard Multinomial Naive Bayes (MNB) algorithm for imbalanced datasets. Specifically, CNB uses statistics from the complement of each class to compute the model's weights. Studies by the inventors of CNB have shown that the parameter estimates of CNB are more stable than those of MNB. Furthermore, CNBs generally perform better (often by a considerable margin) than MNBs on text classification tasks.

2.4. Bernoulli Naive Bayes

BernoulliNB is implemented for multiple Bernoulli distribution data, requiring samples to be represented by binary valued feature vectors. If the sample contains other types of data, a BernoulliNB instance will binarize it (depending on the binarize parameter).

The decision rule of Bernoulli Naive Bayes is based on:

insert image description here

Unlike the rules of multinomial Naive Bayes, where Bernoulli Naive Bayes explicitly penalizes the absence of feature i as a predictor in class y, multinomial Naive Bayes simply ignores the missing features .

2.5. Naive Bayesian model fitting based on external memory

Naive Bayesian models can solve large-scale classification problems where the entire training set cannot be loaded into memory. In order to solve this problem, MultinomialNB, BernoulliNB, and GaussianNB implement the partial_fit method, which can dynamically increase the data. The usage method is the same as that of other classifiers. See Out-of-core classification of text documents for usage examples. All Naive Bayes classifiers support sample weights.

Unlike the fit method, the first call to the partial_fit method is passed a list of all desired class labels.

3. Code example

In this example, the Gaussian Naive Bayesian classifier is used to classify the iris dataset. The specific code is as follows:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn import datasets

class bayes_model():
    def __int__(self):
        pass
    def load_data(self):
        data = datasets.load_iris()
        iris_target = data.target
        iris_features = pd.DataFrame(data=data.data, columns=data.feature_names)
        train_x, test_x, train_y, test_y = train_test_split(iris_features, iris_target, test_size=0.3, random_state=123)
        return train_x, test_x, train_y, test_y
    def train_model(self, train_x, train_y):
        clf = GaussianNB()
        clf.fit(train_x, train_y)
        return clf
    def proba_data(self, clf, test_x, test_y):
        y_predict = clf.predict(test_x)
        y_proba = clf.predict_proba(test_x)
        accuracy = metrics.accuracy_score(test_y, y_predict) * 100
        tot1 = pd.DataFrame([test_y, y_predict]).T
        tot2 = pd.DataFrame(y_proba).applymap(lambda x: '%.2f' % x)
        tot = pd.merge(tot1, tot2, left_index=True, right_index=True)
        tot.columns=['y_true', 'y_predict', 'predict_0', 'predict_1', 'predict_2']
        print('The accuracy of Testset is: %d%%' % (accuracy))
        print('The result of predict is: \n', tot.head())
        return accuracy, tot
    def exc_p(self):
        train_x, test_x, train_y, test_y = self.load_data()
        clf = self.train_model(train_x, train_y)
        res = self.proba_data(clf, test_x, test_y)
        return res

if __name__ == '__main__':
    bayes_model().exc_p()

Screenshot of some results:
insert image description here

Guess you like

Origin blog.csdn.net/weixin_41233157/article/details/126485001
Recommended