Bayesian Classification (_Matlab achieve pattern recognition: a specific implementation code)

Recently learned Bayesian classification principle of "pattern recognition" in the teacher also talked about the implementation of this method and Matlab code implementation (code provided by the teacher), I would like to thank the Zhaozong Ze Zhao. Now I will personally understanding wrote a small article, I hope to help a friend in need, had misunderstood or inadequacies also hope you point out promptly corrected.
The entire classification process:
Bayesian classification first maximum likelihood estimates, derived maximum likelihood estimator and Bayesian classification.
1. maximum likelihood estimation must first generate a training sample:
Here is the matlab code to generate training samples:

randn('seed', 0);
[X_train, Y_train] = generate_gauss_classes(Mu, S, P, N);

figure();
hold on;
class1_data = X_train(:, Y_train==1);
class2_data = X_train(:, Y_train==2);
plot(class1_data(1, :), class1_data(2, :), 'r.');
plot(class2_data(1, :), class2_data(2, :), 'g.');
grid on;
title('Train');
xlabel('N=500');

At the same time generate a test sample:

randn('seed', 100);
[X_test, Y_test] = generate_gauss_classes(Mu, S, P, N);
figure();
hold on;
test1_data = X_test(:, Y_test==1);
test2_data = X_test(:, Y_test==2);
plot(test1_data(1, :), test1_data(2, :), 'r.');
plot(test2_data(1, :), test2_data(2, :), 'g.');
grid on;
title('Test');
xlabel('N=500');

2. The following maximum likelihood estimation to calculate the maximum likelihood estimator:

% 各类样本只包含本类分布的信息,也就是说不同类别的参数在函数上是独立的
[mu1_hat, s1_hat] = gaussian_ML_estimate(class1_data);
[mu2_hat, s2_hat] = gaussian_ML_estimate(class2_data);
mu_hat = [mu1_hat, mu2_hat];
s_hat = (1/2) * (s1_hat + s2_hat);

After the maximum likelihood estimator can be a Bayesian classifier you, first of all to tell you what is the Bayesian classifier and classification process:
Bayesian classification process:
Here Insert Picture Description
The following is a detailed three Bayesian classification process :
the first stage - the preparation phase, the phase of the mission is to make the necessary preparations for the naive Bayesian classifier, the main work is to determine the characteristic properties depending on the circumstances, and the characteristics of each attribute divided appropriately, and then manually pair a portion of the items to be classified classification, form a training sample set. This stage is the input of all the data to be classified, and the output is a characteristic property of training samples. This whole stage is naive Bayesian classifier only need to manually complete the stage, its quality has an important impact on the whole process will, to a large extent the quality classifier, property division and characteristics determined by the quality of the training sample feature attributes.
The second stage - classifier training phase, the phase of the mission is to generate a classifier, the main task is calculated for each category in the training sample and the frequency of occurrence of each feature property division of the conditional probability estimates for each category, and The results are reported. Wherein the input is the attribute and the training samples, the classifier is output. This stage is a mechanical stage, calculation is completed automatically by the program according to the formula discussed earlier.
The third stage - the application stage. This phase of the mission is to be treated using a classifier classifiers classifies the classifier and its input is to be classified items, the output is a mapping relationship to be classified items and categories. This stage is also the mechanical stage, completed by the program.
3. The following are classified with the test sample and the estimated parameters:

   % 使用欧式距离进行分类
z_euclidean = euclidean_classifier(mu_hat, X_test);
% 使用贝叶斯方法进行分类
z_bayesian = bayes_classifier(Mu, S, P, X_test);

4. Then different methods of calculating the error categories:

err_euclidean = ( 1-length(find(Y_test == z_euclidean')) / length(Y_test) );
err_bayesian = ( 1-length(find(Y_test == z_bayesian')) /length(Y_test) );

5. Finally, the results and error classifications show up:

Here Insert Picture Description
The following is a specific implementation code:

% 二维正态分布的两分类问题  (ML估计)

clc;
clear;

% 两个类别数据的均值向量
Mu = [0 0; 3 3]';
% 协方差矩阵
S1 = 0.8 * eye(2);
S(:, :, 1) = S1;
S(:, :, 2) = S1;
% 先验概率(类别分布)
P = [1/3 2/3]';
% 样本数据规模
% 收敛性:无偏或者渐进无偏,当样本数目增加时,收敛性质会更好
N = 500;


% 1.生成训练和测试数据
%{
    生成训练样本
    N = 500,  c = 2, d = 2
    μ1=[0, 0]'   μ2=[3, 3]'
    S1=S2=[0.8, 0; 0.8, 0]
    p(w1)=1/3   p(w2)=2/3
%}
randn('seed', 0);
[X_train, Y_train] = generate_gauss_classes(Mu, S, P, N);

figure();
hold on;
class1_data = X_train(:, Y_train==1);
class2_data = X_train(:, Y_train==2);
plot(class1_data(1, :), class1_data(2, :), 'r.');
plot(class2_data(1, :), class2_data(2, :), 'g.');
grid on;
title('Train');
xlabel('N=500');

%{
    用同样的方法生成测试样本
    N = 500,  c = 2, d = 2
    μ1=[0, 0]'   μ2=[3, 3]'
    S1=S2=[0.8, 0; 0.8, 0]
    p(w1)=1/3   p(w2)=2/3
%}
randn('seed', 100);
[X_test, Y_test] = generate_gauss_classes(Mu, S, P, N);
figure();
hold on;
test1_data = X_test(:, Y_test==1);
test2_data = X_test(:, Y_test==2);
plot(test1_data(1, :), test1_data(2, :), 'r.');
plot(test2_data(1, :), test2_data(2, :), 'g.');
grid on;
title('Test');
xlabel('N=500');


% 2.用训练样本采用ML方法估计参数
% 各类样本只包含本类分布的信息,也就是说不同类别的参数在函数上是独立的
[mu1_hat, s1_hat] = gaussian_ML_estimate(class1_data);
[mu2_hat, s2_hat] = gaussian_ML_estimate(class2_data);
mu_hat = [mu1_hat, mu2_hat];
s_hat = (1/2) * (s1_hat + s2_hat);


% 3.用测试样本和估计出的参数进行分类
% 使用欧式距离进行分类
z_euclidean = euclidean_classifier(mu_hat, X_test);
% 使用贝叶斯方法进行分类
z_bayesian = bayes_classifier(Mu, S, P, X_test);


% 4.计算不同方法分类的误差
err_euclidean = ( 1-length(find(Y_test == z_euclidean')) /  length(Y_test) );
err_bayesian = ( 1-length(find(Y_test == z_bayesian')) /  length(Y_test) );
% 输出信息
disp(['Error rate based on Euclidean distance classification:', num2str(err_euclidean)]);
disp(['The error rate of bayesian classification based on the minimum error rate:', num2str(err_bayesian)]);

**贝叶斯分类:**


 % 画图展示
    figure();
    hold on;
    z_euclidean = transpose(z_euclidean);
    o = 1;
    q = 1;
    for i = 1:size(X_test, 2)
        if Y_test(i) ~= z_euclidean(i)
            plot(X_test(1,i), X_test(2,i), 'bo');
        elseif z_euclidean(i)==1
            euclidean_classifier_results1(:, o) = X_test(:, i);
            o = o+1;
        elseif z_euclidean(i)==2
            euclidean_classifier_results2(:, q) = X_test(:, i);
            q = q+1;
        end
    end
    plot(euclidean_classifier_results1(1, :), euclidean_classifier_results1(2, :), 'r.');
    plot(euclidean_classifier_results2(1, :), euclidean_classifier_results2(2, :), 'g.');
    title(['Error rate based on Euclidean distance classification:', num2str(err_euclidean)]);
    grid on;
    
    figure();
    hold on;
    z_bayesian = transpose(z_bayesian);
    o = 1;
    q = 1;
    for i = 1:size(X_test, 2)
        if Y_test(i) ~= z_bayesian(i)
            plot(X_test(1,i), X_test(2,i), 'bo');
        elseif z_bayesian(i)==1
            bayesian_classifier_results1(:, o) = X_test(:, i);
            o = o+1;
        elseif z_bayesian(i)==2
            bayesian_classifier_results2(:, q) = X_test(:, i);
            q = q+1;
        end
    end
    plot(bayesian_classifier_results1(1, :), bayesian_classifier_results1(2, :), 'r.');
    plot(bayesian_classifier_results2(1, :), bayesian_classifier_results2(2, :), 'g.');
    title(['The error rate of bayesian classification based on the minimum error rate:', num2str(err_bayesian)]);
    grid on;
    function [ z ] = bayes_classifier( m, S, P, X )
%{
    函数功能:
        利用基于最小错误率的贝叶斯对测试数据进行分类

    参数说明:
        m:数据的均值
        S:数据的协方差
        P:数据类别分布概率
        X:我们需要测试的数据

    函数返回:
        z:数据所属的分类
%}

[~, c] = size(m);
[~, n] = size(X);

z = zeros(n, 1);
t = zeros(c, 1);
for i = 1:n
    for j = 1:c
        t(j) = P(j) * comp_gauss_dens_val( m(:,j), S(:,:,j), X(:,i) );
    end
    [~, z(i)] = max(t);
end

end

Calculating a Gaussian distribution N (m, s), a value of a specific point

function [ z ] = comp_gauss_dens_val( m, s, x )
%{
    参数说明:
        m:数据的均值
        s:数据的协方差
        x:我们需要计算的数据点

    函数返回:
        z:高斯分布在x出的值
%}

z = ( 1/( (2*pi)^(1/2)*det(s)^0.5 ) ) * exp( -0.5*(x-m)'*inv(s)*(x-m) );

end

Using the Euclidean distance test data classification

function [ z ] = euclidean_classifier( m, X )
%{
    

    参数说明:
        m:数据的均值,由ML对训练数据,参数估计得到
        X:我们需要测试的数据

    函数返回:
        z:数据所属的分类
%}

[~, c] = size(m);
[~, n] = size(X);

z = zeros(n, 1);
de = zeros(c, 1);
for i = 1:n
    for j = 1:c
        de(j) = sqrt( (X(:,i)-m(:,j))' * (X(:,i)-m(:,j)) );
    end
    [~, z(i)] = min(de);
end

end

Maximum Likelihood Estimation:
samples of normal distribution of the maximum likelihood estimate

  function [ m_hat, s_hat ] = gaussian_ML_estimate( X )
    %{
        函数功能:
            样本正态分布的最大似然估计
    
        参数说明:
            X:训练样本
    
        函数返回:
            m_hat:样本由极大似然估计得出的正态分布参数,均值
            s_hat:样本由极大似然估计得出的正态分布参数,方差
    %}
    
    % 样本规模
    [~, N] = size(X);
    % 正态分布样本总体的未知均值μ的极大似然估计就是训练样本的算术平均
    m_hat = (1/N) * sum(transpose(X))';
    
    % 正态分布中的协方差阵Σ的最大似然估计量等于N个矩阵的算术平均值
    s_hat = zeros(1);
    for k = 1:N
        s_hat = s_hat + (X(:, k)-m_hat) * (X(:, k)-m_hat)';
    end
    s_hat = (1/N)*s_hat;
    end

Generate sample data:

function [ data, C ] = generate_gauss_classes( M, S, P, N )
%{
    函数功能:
        生成样本数据,符合正态分布

    参数说明:
        M:数据的均值向量
        S:数据的协方差矩阵
        P:各类样本的先验概率,即类别分布
        N:样本规模

    函数返回
        data:样本数据(2*N维矩阵)
        C:样本数据的类别信息
%}

[~, c] = size(M);
data = [];
C = [];

for j = 1:c
    % z = mvnrnd(mu,sigma,n);
    % 产生多维正态随机数,mu为期望向量,sigma为协方差矩阵,n为规模。
    % fix 函数向零方向取整
    t = mvnrnd(M(:,j), S(:,:,j), fix(P(j)*N))';
    
    data = [data t];
    C = [C ones(1, fix(P(j) * N)) * j];
end

end

Code Instructions:
need to add the path to the file before running into matlab, then run ML_classification_test.m files, other files are functions.

Published 16 original articles · won praise 18 · views 20000 +

Guess you like

Origin blog.csdn.net/qq_37554556/article/details/89329376