1). The k-means clustering process of manual playing of playing cards:> 30 cards, 3 types
Figure 1 Statistical table
Figure 2 Actual situation of the first round
Figure 3 Actual situation of the second round
2). * K-means algorithm is independently written, clustering is performed on the iris petal length data, and displayed with a scatterplot. (Plus points)
ps: The artificial intelligence teacher has taught this algorithm before, so the code is basically the same.
Source code:
# Import data set from sklearn.datasets import load_iris import numpy as np data = load_iris (). Data n = len (data) # Calculate the total number of samples m = data.shape [1] # The number of sample attributes k = 3 #Select the class The number of centers dist = np.zeros ([n, k + 1]) #Initialize the distance matrix, the last column stores the category of each sample center = data [: k ,: ] center_new = np.zeros ([k, m] ) while True: for i in range (n): for j in range (k): dist [i, j] = np.sqrt (sum ((data [i,:]-center [j,:]) ** 2)) dist [i, k] = np.argmin (dist [i,: k]) for i in range (k): index = dist [:, k] == i # center_new [i,:] = data [index,:]. mean (axis = 0) ,: if np.all ((center == center_new)): break else: center = center_new print ("Classification of 150 samples \ n", dist [:, k])
Figure 4 Clustering results
3). Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot.
Source code:
from sklearn.datasets import load_iris from sklearn.cluster import KMeans import matplotlib.pyplot as plt iris = load_iris () data = iris.data [:, 1] x = data.reshape (-1,1) # 平化y = KMeans (n_clusters = 3) # Introduce the construction of KMeans model with a centroid number of 3 y.fit (x) #Train the model and calculate k-means clustering y_pre = y.predict (x) #Predict according to the trained model, that is, calculate cluster Center and predict the clustering index of each sample plt.scatter (x [:, 0], x [:, 0], c = y_pre, s = 50, cmap = 'rainbow') plt.show ()
Figure 5 Code and scatter plot
4). Complete data of iris flowers are clustered and displayed with a scatterplot.
Source code:
Import load_iris sklearn.datasets from
from sklearn.cluster KMeans Import
Import matplotlib.pyplot AS PLT
IRIS = load_iris ()
X = iris.data
Y = KMeans (= n_clusters. 3) is introduced construct # KMeans model number of the centroid. 3
y.fit ( x)
#Train the model and calculate the k-means clustering y_pre = y.predict (x) #Predict according to the trained model, that is, calculate the clustering center and predict the clustering index of each sample
print ("prediction result: \ n" , y_pre)
plt.scatter (x [:, 2], x [:, 3], c = y_pre, s = 100, cmap = 'rainbow', alpha = 0.5)
plt.show ()
Figure 6 Predicted results and their scatter plots
5). Think about what is used in the k-means algorithm?
Answer: k-means clustering is the most famous clustering algorithm. Due to its simplicity and efficiency, it is the most widely used among all clustering algorithms. Given a set of data points and the required number of clusters k, k is specified by the user, the k-means algorithm repeatedly divides the data into k clusters according to a certain distance function. Life can be used to classify and classify things according to certain characteristics, such as analyzing the Chinese football team's level according to previous years' data, predicting this year's seed quality based on previous year's seed quality, data classification, etc.