1 read data
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('X.csv')
df
The result display:
elbow method
data = np.array(df)
sse = []
for k in range(1, 11):
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(hyt_data)
sse.append(kmeans.inertia_)
plt.rcParams["font.sans-serif"] = "SimHei"
plt.plot(range(1, 11), sse)
plt.xlabel('Number of clusters')
plt.ylabel('SSE')
plt.title("手肘法")
plt.show()
result:
It can be seen from the elbow method that there is an obvious inflection point at 4, and it can be roughly concluded that the optimal number of clusters is 4.