sklearn聚类KMeans

KMeans是最简单的聚类算法了，算法将一组N个样本的特征矩阵划分为K个无交集的簇，直观上来看是一组一组聚集在一起的数据，在一个簇中的数据就认为是同一类。

n_clusters

是KMeans中的k，k=模型划分为几类（必填参数，默认为8），但我们通常的结果会是一个小于8的结果。
代码实现（观察数据集的数据分布）：

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
fig, ax1 = plt.subplots(1)

color = ["red", "pink", "orange", "gray"]
for i in range(4):
    ax1.scatter(X[y == i, 0], X[y == i, 1]
                , marker='o'
                , s=8
                , c=color[i]
                )

plt.show()

在这里插入图片描述
分成1-10簇分别的代码实现：

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
# fig, ax1 = plt.subplots(1)
# 
# color = ["red", "pink", "orange", "gray"]
# for i in range(4):
#     ax1.scatter(X[y == i, 0], X[y == i, 1]
#                 , marker='o'
#                 , s=8
#                 , c=color[i]
#                 )
# 
# plt.show()

cluster = KMeans(n_clusters=3, random_state=0).fit(X)
y_pred = cluster.labels_
print(y_pred)
print("---------------------------------------------------------------------")

pre = cluster.fit_predict(X)
print(pre == y_pred)
print("---------------------------------------------------------------------")

centroid=cluster.cluster_centers_
print(centroid)
print(centroid.shape)
print("---------------------------------------------------------------------")

inertia=cluster.inertia_
print(inertia) # 最小化每个样本点到质心的距离之和
print("---------------------------------------------------------------------")

# color=["red","pink","orange","gray"]
# fig,ax1=plt.subplots(1)
#
# for i in range(3):
#     ax1.scatter(X[y_pred,0],X[y_pred,1]
#                 ,marker='o'
#                 ,s=8
#                 ,c=color[i]
#                 )
#
# ax1.scatter(centroid[:,0],centroid[:,1]
#             ,marker='x'
#             ,s=15
#             ,c="black"
#            )
#
# plt.show()

cluster=KMeans(4,random_state=0).fit(X)
inertia_=cluster.inertia_
print(inertia_) # 最小化每个样本点到质心的距离之和

for i in range(1,10):
    cluster = KMeans(i, random_state=0).fit(X)
    inertia_ = cluster.inertia_
    print(inertia_)  # 最小化每个样本点到质心的距离之和

学习不易

发布了90 篇原创文章 · 获赞 15 · 访问量 3170

私信关注

n_clusters

猜你喜欢