Comparison of the advantages and disadvantages of k-means and GMM

1. Code

import numpy as np, matplotlib.pyplot as mp
from sklearn.cluster import KMeans 
from sklearn import datasets 
from sklearn import mixture

np.random.seed(8)  # 设定随机环境
# 创建随机样本
X, _ = datasets.make_blobs(centers=[[0, 0]])
X1 = np.dot(X, [[4, 1], [1, 1]])
X2 = np.dot(X[:50], [[1, 1], [1, -5]]) - 2
X = np.concatenate((X1, X2))
y = [0] * 100 + [1] * 50
# KMeans
kmeans = KMeans(n_clusters=2)
y_kmeans = kmeans.fit(X).predict(X)
# 绘图
for e, labels in enumerate([y, y_kmeans], 1):
    mp.subplot(1, 2, e)
    mp.scatter(X[:, 0], X[:, 1], c=labels, s=40, alpha=0.6)
    mp.xticks(())
    mp.yticks(())
mp.show()
# GMM
gmm=mixture.GaussianMixture(n_components=2,covariance_type='full')
y_gmm=gmm.fit(X).predict(X)
# 绘图
for e, labels in enumerate([y, y_gmm], 1):
    mp.subplot(1, 2, e)
    mp.scatter(X[:, 0], X[:, 1], c=labels, s=40, alpha=0.6)
    mp.xticks(())
    mp.yticks(())
mp.show()

2. Effect

GMM is more flexible than K-Means in dealing with data shapes (the data set can be any ellipsoidal shape, not limited to spherical.), so as shown in the figure, the clustering effect of GMM is just right. At the same time, GMM uses probability, and each data point may be divided into multiple clusters, especially if the data point is located in the middle of two overlapping clusters.

K-Means

GMM

Guess you like

Origin blog.csdn.net/m0_57491181/article/details/129777763