"Pattern Recognition and Intelligent Computing" Implementation of Bayesian Classification Based on Binary Data

Algorithm flow
  1. Binarize the data
  2. Calculate the prior probability of each type of number
  3. Calculate conditional probability
  4. Calculate the posterior probability
    (see page 77 for the specific calculation process)
Algorithm implementation

Bayesian Algorithm

def bayeserzhi(x_train,y_train,sample):
    """
    :function 基于二值数据的贝叶斯分类器
    :param x_train: 训练集 M*N  M为样本个数 N为特征个数
    :param y_train: 训练集标签 1*M
    :param sample: 待识别样品
    :return: 返回判断类别
    """
    #后验概率
    pwx = []

    target = np.unique(y_train)

    spit = 0.5 * (np.max(x_train) - np.min(x_train))
    train = np.where(x_train > spit, 1, 0)
    sample = np.where(sample > spit, 1, 0)

    for i in target:
        trainIndex = (([j for j, y in enumerate(y_train) if y == i]))
        trainNum = len(trainIndex)
        # 计算先验概率
        pw = trainNum/x_train.shape[0]
        # 计算类条件概率
        p = (np.sum(train[trainIndex],axis=0)+1)/(trainNum+2)
        pxw = 1
        for j in range(train.shape[1]):
            if sample[j]:
                pxw *= p[j]
            else:
                pxw *= (1-p[j])
        #计算pxw*pw
        pwx.append(pxw*pw)
    pwx = pwx/np.sum(pwx)
    maxId = np.argmax(pwx)
    label = target[maxId]
    return label

Partition data set

def train_test_split(x,y,ratio = 3):
    """
    :function: 对数据集划分为训练集、测试集
    :param x: m*n维 m表示数据个数 n表示特征个数
    :param y: 标签
    :param ratio: 产生比例 train:test = 3:1(默认比例)
    :return: x_train y_train  x_test y_test
    """
    n_samples , n_train = x.shape[0] , int(x.shape[0]*(ratio)/(1+ratio))
    train_id = random.sample(range(0,n_samples),n_train)
    x_train = x[train_id,:]
    y_train = y[train_id]
    x_test = np.delete(x,train_id,axis = 0)
    y_test = np.delete(y,train_id,axis = 0)
    return x_train,y_train,x_test,y_test
Test code
from sklearn import datasets
from Include.chapter4 import function
import numpy as np

#读取数据
digits = datasets.load_digits()
x , y = digits.data,digits.target

#划分数据集
x_train, y_train, x_test, y_test = function.train_test_split(x,y)
testId = np.random.randint(0, x_test.shape[0])
sample = x_test[testId, :]

#模板匹配
ans = function.bayeserzhi(x_train,y_train,sample)
y_test[testId]
print("预测的数字类型",ans)
print("真实的数字类型",y_test[testId])
Algorithm result
预测的数字类型 0
真实的数字类型 0

Guess you like

Origin blog.csdn.net/kiwi_berrys/article/details/103962363