k-means与GMM优劣对比

1、代码

import numpy as np, matplotlib.pyplot as mp
from sklearn.cluster import KMeans 
from sklearn import datasets 
from sklearn import mixture

np.random.seed(8)  # 设定随机环境
# 创建随机样本
X, _ = datasets.make_blobs(centers=[[0, 0]])
X1 = np.dot(X, [[4, 1], [1, 1]])
X2 = np.dot(X[:50], [[1, 1], [1, -5]]) - 2
X = np.concatenate((X1, X2))
y = [0] * 100 + [1] * 50
# KMeans
kmeans = KMeans(n_clusters=2)
y_kmeans = kmeans.fit(X).predict(X)
# 绘图
for e, labels in enumerate([y, y_kmeans], 1):
    mp.subplot(1, 2, e)
    mp.scatter(X[:, 0], X[:, 1], c=labels, s=40, alpha=0.6)
    mp.xticks(())
    mp.yticks(())
mp.show()
# GMM
gmm=mixture.GaussianMixture(n_components=2,covariance_type='full')
y_gmm=gmm.fit(X).predict(X)
# 绘图
for e, labels in enumerate([y, y_gmm], 1):
    mp.subplot(1, 2, e)
    mp.scatter(X[:, 0], X[:, 1], c=labels, s=40, alpha=0.6)
    mp.xticks(())
    mp.yticks(())
mp.show()

2、效果

GMM比K-Means在处理数据形状方面更灵活(数据集可以是任何椭球形状,而不是限于球形。),所以如图,GMM的聚类效果刚好。同时,GMM使用概率,每个数据点可能会被划分成多个簇,尤其是数据点位于两个重叠的簇中间。

K-Means

GMM

猜你喜欢

转载自blog.csdn.net/m0_57491181/article/details/129777763