Clustering Algorithm
Article Directory
learning target
- Master clustering algorithm implementation process
- We know K-means algorithm theory
- We know evaluation model clustering algorithm
- The advantages and disadvantages of K-means
- Understand way clustering algorithm optimization
- Application Kmeans achieve clustering task
6.2 clustering algorithm api initial use
1 api Introduction
- sklearn.cluster.KMeans(n_clusters=8)
- parameter:
- n_clusters: the number of cluster centers began
- Integer, default = 8, the resulting number of clusters, i.e., the center of mass produced (centroids of) number.
- n_clusters: the number of cluster centers began
- method:
- estimator.fit(x)
- estimator.predict(x)
- estimator.fit_predict(x)
- Computing cluster centers and the prediction of each sample belongs to which category, which is equivalent to calling fit (x), and then call predict (x)
- parameter:
2 Case
Creating different random two-dimensional data set as a training set, combined with its k-means clustering algorithm, you can try different numbers of clusters are clusters, and observe the clustering effect:
N_cluster different clustering parameters by value, different clustering results obtained
2.1 Process Analysis
2.2 code implementation
1. Create a data set
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import calinski_harabaz_score
# 创建数据集
# X为样本特征,Y为样本簇类别, 共1000个样本,每个样本4个特征,共4个簇,
# 簇中心在[-1,-1], [0,0],[1,1], [2,2], 簇方差分别为[0.4, 0.2, 0.2, 0.2]
X, y = make_blobs(n_samples=1000, n_features=2, centers=[[-1, -1], [0, 0], [1, 1], [2, 2]],
cluster_std=[0.4, 0.2, 0.2, 0.2],
random_state=9)
# 数据集可视化
plt.scatter(X[:, 0], X[:, 1], marker='o')
plt.show()
2. using k-means clustering, and the evaluation method using CH
y_pred = KMeans(n_clusters=2, random_state=9).fit_predict(X)
# 分别尝试n_cluses=2\3\4,然后查看聚类效果
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.show()
# 用Calinski-Harabasz Index评估的聚类分数
print(calinski_harabaz_score(X, y_pred))