【Machine Learning Algorithm】 Near Bayesian Classifier

During Ctrip's written test in 2019, the handwritten Naive Bayes classifier was used. Although the principle was clear, but it was not practiced, it was very messy in the examination room.

The Naive Bayes principle is relatively simple. After the training data is given, the prior probability and conditional probability are calculated directly (the probability that a certain feature takes a specific value when the category is determined), and then the new data is assigned to the one with the largest posterior probability class.

Among them, when calculating the conditional probability, the defaultdict data structure in Python can easily count the number of each value of each feature, and the value of dict can be initialized to 0.

Let's take a look at the difference between defaultdict and dict ~

>>> a = defaultdict(int)
>>> a
defaultdict(<class 'int'>, {})

>>> b = dict()
>>> b
{}

# 同样的操作,a没有报错,b报错了
>>> a[1] += 1
>>> a
defaultdict(<class 'int'>, {1: 1})

>>> b[1] += 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1

 

An example question is Page in Li Hang's statistical learning method: 50-51

Anyway, this is just a basic framework, and you can continue to optimize, such as discretization of continuous features, the number of categories exceeds 2, and so on. . .

import numpy as np
from collections import defaultdict
# 训练数据为15个样本
x1 = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
x2 = [1, 2, 2, 1, 1, 1, 2, 2, 3, 3, 3, 2, 2, 3, 3]
y = [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]
input = np.column_stack([x1, x2, y])
# 求样本点(2,1)的类别
new_data = [2, 1]

class Bayes_Classifier(object):
    '''

    构造朴素贝叶斯分类器

    '''
    def __init__(self, input, new_data):
        self.x = input[:, :-1]
        self.y = input[:, -1]
        self.prior_prob = self.compute_prior_prob()
        self.x1_d0, self.x1_d1 = self.compute_cond_prob(0)
        self.x2_d0, self.x2_d1 = self.compute_cond_prob(1)
        self.test_data = new_data

        
    def compute_prior_prob(self):
        '''

        先验概率的极大似然估计

        '''
        prior_prob = defaultdict(int)
        for i in range(len(self.y)):
            if self.y[i] == -1:
                prior_prob[-1] += 1
            else:
                prior_prob[1] += 1
        for key in prior_prob.keys():
            prior_prob[key] /= len(self.y)
    
        return prior_prob


    def compute_cond_prob(self, j):
        '''

        条件概率的极大似然估计
        <类确定的条件下,特征j取某值的概率>

        '''
        d0 = defaultdict(int)
        y0 = 0
        d1 = defaultdict(int)
        y1 = 0
        for n in range(self.x.shape[0]):
            if self.y[n] == -1:
                d0[self.x[n, j]] += 1
                y0 += 1
            else:
                d1[self.x[n, j]] += 1
                y1 += 1
        for key, value in d0.items():
            d0[key] = value / y0
        for key, value in d1.items():
            d1[key] = value / y1
        return d0, d1
    
    def max_pos_prob(self):
        '''

        最大化后验概率
        
        '''
        p0 = self.x1_d0[self.test_data[0]] * self.x2_d0[self.test_data[1]] * self.prior_prob[-1]
        p1 = self.x1_d1[self.test_data[0]] * self.x2_d1[self.test_data[1]] * self.prior_prob[1]
        return p0, p1



bayes = Bayes_Classifier(input, new_data)
# print(bayes.unique_y)
# print(bayes.x1_d0, bayes.x2_d0)
# print(bayes.x1_d1, bayes.x2_d1)
# print(bayes.prior_prob)
print(bayes.max_pos_prob())

 

Published 26 original articles · won 13 · views 7292

Guess you like

Origin blog.csdn.net/original_recipe/article/details/89160855