Naive Bayes code implementation python

  • P(B) is called "prior probability", that is, a judgment on the probability of event B before event A occurs.
  • P(B|A) is called "posterior probability", that is, the re-evaluation of the probability of event B after the occurrence of event A.
  • P(A|B)/P(A) is called the "likelihood function", which is an adjustment factor that makes the estimated probability closer to the true probability.
  • Posterior probability = prior probability * adjustment factor

Conditional Probability

Find the number of small coins;

Naive Bayes code implementation python 

​​​​​​A detailed explanation of the Naive Bayesian algorithm for machine learning_Ping Yuan's blog-CSDN blog_Naive Bayesian algorithm

1. Naive Bayes
1. Basic knowledge of probability:
Conditional probability refers to the probability of event A occurring under the condition that another event B has already occurred. Conditional probability is expressed as: P(A|B), read as "probability of A given B".
If there are only two events A and B, then:

​ Total probability formula: It means that if events A1, A2, ..., An form a complete event group and have positive probabilities, then the formula holds for any event B.

Python topic (naive Bayesian)

 

 

The first stage——preparation stage, determine the feature attributes according to the specific situation, properly divide each feature attribute, and then manually classify a part of the items to be classified to form a training sample set. The input of this stage is all the data to be classified, and the output is feature attributes and training samples. This stage is the only stage that needs to be completed manually in the entire Naive Bayesian classification, and its quality will have an important impact on the entire process. The quality of the classifier is largely determined by the feature attributes, feature attribute division, and training sample quality.

The second stage - classifier training stage, the task of this stage is to generate a classifier, the main work is to calculate the frequency of occurrence of each category in the training samples and the conditional probability estimate of each category for each feature attribute division, and Results are recorded. Its input is feature attributes and training samples, and the output is a classifier. This stage is a mechanical stage, which can be automatically calculated by the program according to the formula discussed above.

The third stage - the application stage. The task of this stage is to use the classifier to classify the items to be classified, the input is the classifier
and the items to be classified, and the output is the mapping relationship between the items to be classified and the categories. This stage is also a mechanical stage, completed by the program.

#coding:utf-8
# 极大似然估计  朴素贝叶斯算法
import pandas as pd
import numpy as np

class NaiveBayes(object):
    def getTrainSet(self):
        dataSet = pd.read_csv('naivebayes_data.csv')
        dataSetNP = np.array(dataSet)  #将数据由dataframe类型转换为数组类型
        trainData = dataSetNP[:,0:dataSetNP.shape[1]-1]   #训练数据x1,x2
        labels = dataSetNP[:,dataSetNP.shape[1]-1]        #训练数据所对应的所属类型Y
        return trainData, labels

    def classify(self, trainData, labels, features):
        #求labels中每个label的先验概率
        labels = list(labels)    #转换为list类型
        labels = list(labels)    #转换为list类型
        P_y = {}       #存入label的概率
        for label in labels:
            P_y[label] = labels.count(label)/float(len(labels))   # p = count(y) / count(Y)

        #求label与feature同时发生的概率
        P_xy = {}
        for y in P_y.keys():
            y_index = [i for i, label in enumerate(labels) if label == y]  # labels中出现y值的所有数值的下标索引
            for j in range(len(features)):      # features[0] 在trainData[:,0]中出现的值的所有下标索引
                x_index = [i for i, feature in enumerate(trainData[:,j]) if feature == features[j]]
                xy_count = len(set(x_index) & set(y_index))   # set(x_index)&set(y_index)列出两个表相同的元素
                pkey = str(features[j]) + '*' + str(y)
                P_xy[pkey] = xy_count / float(len(labels))

        #求条件概率
        P = {}
        for y in P_y.keys():
            for x in features:
                pkey = str(x) + '|' + str(y)
                P[pkey] = P_xy[str(x)+'*'+str(y)] / float(P_y[y])    #P[X1/Y] = P[X1Y]/P[Y]

        #求[2,'S']所属类别
        F = {}   #[2,'S']属于各个类别的概率
        for y in P_y:
            F[y] = P_y[y]
            for x in features:
                F[y] = F[y]*P[str(x)+'|'+str(y)]     #P[y/X] = P[X/y]*P[y]/P[X],分母相等,比较分子即可,所以有F=P[X/y]*P[y]=P[x1/Y]*P[x2/Y]*P[y]

        features_label = max(F, key=F.get)  #概率最大值对应的类别
        return features_label


if __name__ == '__main__':
    nb = NaiveBayes()
    # 训练数据
    trainData, labels = nb.getTrainSet()
    # x1,x2
    features = [2,'S']
    # 该特征应属于哪一类
    result = nb.classify(trainData, labels, features)
    print (features,'属于',result)
复制代码

​edit

 Naive Bayesian Algorithm--Python Implementation- Yiye Zhouming- Blog Garden

Laplace smoothing λ=1 K=2, S=3; λ=1 Laplace smoothing

What should I do if the probability is 0 when estimating the conditional probability P(X|Y)?
To put it simply: Introducing λ, when λ=1 is called Laplace smoothing.

#coding:utf-8
#朴素贝叶斯算法   贝叶斯估计, λ=1  K=2, S=3; λ=1 拉普拉斯平滑
import pandas as pd
import numpy as np

class NavieBayesB(object):
    def __init__(self):
        self.A = 1    # 即λ=1
        self.K = 2
        self.S = 3

    def getTrainSet(self):
        trainSet = pd.read_csv('naivebayes_data.csv')
        trainSetNP = np.array(trainSet)     #由dataframe类型转换为数组类型
        trainData = trainSetNP[:,0:trainSetNP.shape[1]-1]     #训练数据x1,x2
        labels = trainSetNP[:,trainSetNP.shape[1]-1]          #训练数据所对应的所属类型Y
        return trainData, labels

    def classify(self, trainData, labels, features):
        labels = list(labels)    #转换为list类型
        #求先验概率
        P_y = {}
        for label in labels:
            P_y[label] = (labels.count(label) + self.A) / float(len(labels) + self.K*self.A)

        #求条件概率
        P = {}
        for y in P_y.keys():
            y_index = [i for i, label in enumerate(labels) if label == y]   # y在labels中的所有下标
            y_count = labels.count(y)     # y在labels中出现的次数
            for j in range(len(features)):
                pkey = str(features[j]) + '|' + str(y)
                x_index = [i for i, x in enumerate(trainData[:,j]) if x == features[j]]   # x在trainData[:,j]中的所有下标
                xy_count = len(set(x_index) & set(y_index))   #x y同时出现的次数
                P[pkey] = (xy_count + self.A) / float(y_count + self.S*self.A)   #条件概率

        #features所属类
        F = {}
        for y in P_y.keys():
            F[y] = P_y[y]
            for x in features:
                F[y] = F[y] * P[str(x)+'|'+str(y)]

        features_y = max(F, key=F.get)   #概率最大值对应的类别
        return features_y


if __name__ == '__main__':
    nb = NavieBayesB()
    # 训练数据
    trainData, labels = nb.getTrainSet()
    # x1,x2
    features = [2,' U']
    # 该特征应属于哪一类
    result = nb.classify(trainData, labels, features)
    print (features,'属于',result)
复制代码

​edit

 [Simple: feature conditional independence Bayesian: based on Bayesian theorem]

1 The concept of Naive Bayes [joint probability distribution, prior probability, conditional probability**, full probability formula] [conditional independence assumption,] maximum likelihood estimation
2 advantages and disadvantages    
[advantages: stable classification efficiency; for missing data Insensitive, the algorithm is relatively simple, and is often used in text classification; when the attribute correlation is small, the algorithm has the best performance Disadvantages: Assume that the attributes are independent of each other; the prior probability depends mostly on the assumption; it is very sensitive to the expression form of the input data]
3 Prior probability and posterior probability
The calculation of prior probability is relatively simple, without using Bayesian formula;
while the calculation of posterior probability uses Bayesian formula, and when using sample data to calculate logical probability, it also needs to use theoretical Probability distribution requires more knowledge of mathematical statistics.
4 Naive Bayesian parameter estimation:
①Maximum likelihood estimation (possible probability is 0) ②Bayesian estimation (add constant, Laplace smoothing)

What should I do if the probability is 0 when estimating the conditional probability P(X|Y)?
To put it simply: Introducing λ, when λ=1 is called Laplace smoothing.

4. Advantages
and disadvantages of Naive Bayesian Advantages: It performs well on small-scale data, is suitable for multi-classification tasks, and is suitable for incremental training.
Disadvantages: It is very sensitive to the expression form of the input data (discrete, continuous, extremely large and small values, etc.).

Key point:
How to answer Naive Bayes in the interview?
First of all, Naive Bays is a generative model (very important), and secondly, it calculates the joint probability by learning known samples, and then finds the conditional probability.

The difference between generative mode and discriminative mode

Generation mode: The joint probability distribution is learned from the data, and the prediction model of the conditional probability distribution P(Y|X) is obtained;
common generation models include: Naive Bayesian, Hidden Markov Model, Gaussian Mixture Model, Document Topic Generation Model (LDA), Restricted Boltzmann Machine
Discriminant mode: The decision function or conditional probability distribution learned from data is used as a prediction model.
Common discriminant models include: K-nearest neighbor, SVM, decision tree, perceptron, linear discriminant analysis (LDA) , linear regression, traditional neural network, logistic regression, boosting, conditional random field.

Guess you like

Origin blog.csdn.net/weixin_73136678/article/details/128905218