PyTorch deep learning combat | Gaussian mixture model clustering principle analysis

01. Problem description

In order to understand the principle of Gaussian mixture model to solve clustering problems, this example uses three one-variable Gaussian functions to form the original data, and then uses GMM to cluster.

1) data

The three unary Gaussian component functions can be represented by mean and covariance as shown in Table 1:

▍Table 1 The mean and covariance of three unary Gaussian component functions

Each Gaussian component function is assigned different weights, among which the weight of No. 1 component is 30%, the weight of No. 2 component is 50%, and the weight of No. 3 component is 20%. Randomly generate 1000 sample data.

2) Visualization

To understand how the three Gaussian component functions mix, the three univariate Gaussian functions can be displayed in two-dimensional coordinates, showing a bell plot of the three Gaussian component functions. Then, the three components are blended according to the weight ratio, showing the graph of the blended function of the three components.

3) Clustering

In order to find which component the mixed data belongs to, a clustering method can be used to classify the data. After clustering, assign a label of 1, 2 or 3 to each data. Review the order when mixing three Gaussian functions. For 1000 sample data, whether the first 300 belong to component 1 or not, the correct label should be 1 , the middle 500 belong to component No. 2, and the correct label should be 2. The last 200 belong to component No. 3, and the correct label should be 3. After checking the clustering, the accuracy of the classification label is obtained.

02. Example analysis reference solution

Data generation MATLAB/Octave reference code:

mu1=[-1];
mu2=[0];
mu3=[3];
sigma1=[2.25];
sigma2=[1];
sigma3=[.25];

Each Gaussian component function is assigned different weights, among which the weight of No. 1 component is 30%, the weight of No. 2 component is 50%, and the weight of No. 3 component is 20%. Randomly generate 1000 sample data. The MATLAB code is as follows:

weight1=[.3];
weight2=[.5];
weight3=[.2];
component_1=mvnrnd(mu1,sigma1,300);
component_2=mvnrnd(mu2,sigma2,500);
component_3=mvnrnd(mu3,sigma3,200);
X=[component_1;component_2;component_3];

Three univariate Gaussian functions are displayed in two-dimensional coordinates, and the MATLAB code is as follows:

gd1=exp(-0.5*((component_1-mu1)/sigma1).^2)/(sigma1*sqrt(2*pi));
gd2=exp(-0.5*((component_2-mu2)/sigma2).^2)/(sigma2*sqrt(2*pi));
gd3=exp(-0.5*((component_3-mu3)/sigma3).^2)/(sigma3*sqrt(2*pi));
figure;
plot(component_1,gd1,'.');hold on;
plot(component_2,gd2,'.');hold on;
plot(component_3,gd3,'.');
title('Bell cureves of three components');
xlabel('Randomly produced numbers');ylabel('Gauss distribution');

After running the above code, you can see the bell diagram of the three component functions as shown in Figure 1.

▍Figure 1 Bell diagram of three one-variable Gaussian functions

The three components are mixed according to the weight ratio, and the MATLAB code is as follows:

gm1=gmdistribution.fit(X,3);
a=pdf(gm1,X);
figure;plot(X,a,'.');
title('Curve of Gaussian mixture distribution');
xlabel('Randomly produced numbers');
ylabel('Gauss distribution');

Run the above code to obtain the mixed graphics of the three components, as shown in Figure 2.

▍Figure 2 The graph after mixing three one-variable Gaussian functions

In order to find which component the mixed data belongs to, the clustering method can be used to classify the data. The MATLAB implementation code is as follows:

idx=cluster(gm1,X);

聚类后给每个数据分配1,2或者3其中的一个标签,回顾在混合三个高斯函数时的顺序,对于1000个样本数据,前300个属于1号组件,正确标签应该为1,中间500个属于2号组件,正确标签应该为2,最后200个属于3号组件,正确标签应该为3,聚类结果后得到分类标签的准确率可以采用如下代码来查看:

figure;
hold on;
for i=1:1000
ifidx(i)==1
plot(X(i),0,'r*');
elseifidx(i)==2
plot(X(i),0,'b+');
else
plot(X(i),0,'go');
    end
end
title('Plot illustrating the cluster assignment');
xlabel('Randomly produced numbers');
ylim([-0.1 0.1]);

03、运行结果

运行代码聚类结果如图3所示,可以看出,绝大部分的数据被分配到正确的标签,也存在少数错误分类。

▍图3 高斯混合模型聚类结果分析

04、代码

https://www.jianguoyun.com/p/Ddr2dTYQ9of0Chiko_4EIAA

05、文末送书

内容简介

Web3正频繁出现在公众视野中,然而受阻于晦涩难懂的技术原理及陌生又拗口的专业术语,很多人对此望而却步。本书试图用通俗的语言、简单的结构、翔实的案例让零基础的读者迅速掌握Web3的核心要义。

Web3不仅仅是技术和金融语境,它和每个人的生活都息息相关。作为深耕Web3的研究机构,Inverse DAO将带你通过纵向时间线、横向技术线来立体、客观、完整地理解Web3。通过本书你既可以快速读懂行业,也可以躬身实践参与。

希望本书可以抛砖引玉,启迪你的智慧之光,发现Web3更多、更广、更深的奥秘,助你在新的科技浪潮下,无往而不胜。

作者简介

Anymose,中国人民大学传播学硕士,Inverse DAO(Web3投资研究机构)发起人,曾供职知名风险资本分析师,具有丰富的Web3理论研究、项目投资、运营实践经验,帮助Qredo、Fetch、Gitcoin等诸多项目进行新一代信息化建设。

参与方式:文章三连并评论“珍爱生命,远离加班”,参与抽奖,送出2本技术图书《从零开始读懂Web3》,24小时后,公布抽奖结果!

Guess you like

Origin blog.csdn.net/qq_41640218/article/details/130281854