1). The k-means clustering process of manual playing of playing cards:> 30 cards, 3 types
2). * K-means algorithm is independently written, clustering is performed on the iris petal length data, and displayed with a scatterplot. (Plus points)
3). Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot.
4). Complete data of iris flowers are clustered and displayed with a scatterplot.
5). Think about what is used in the k-means algorithm?
answer:
1) The k-means clustering process of manual rehearsal for playing cards:> 30 cards, 3 types
Randomly select three centers (8, 3, 2);
After the first step, it becomes (9, 4, 2);
After the second step, it becomes (10, 6, 2), which has not changed, and the cluster center is (10, 6, 2);
2) Write the K-means algorithm independently, cluster with the iris petal length data, and display it with a scatterplot.
# -*- coding:utf-8 -*- from sklearn.datasets import load_iris import pandas as pd import numpy as np import matplotlib.pyplot as plt # Select the K space data objects as the initial center, each representing a cluster centers; DEF initcenter (Data, K): m = data.shape [. 1] # samples the number of attributes Center = [] temp = 0 temptemp = 0 for i in range(k): temp = np.random.randint(0, len(data)) if temp != temptemp: center.append(data[temp, :]) temptemp = temp else : print ( " repeat " ) i -= 1 center = np.array(center) return center def nearest(center, x): distance = [] for j in range(k): distance.append(sum((x - center[j,:]) ** 2)) # print(np.argmin(distance)) y = np.argmin(distance) return y def xclassify(data, y, center): for index,x in enumerate(data): y[index] = nearest(center, x) return y def kcmean(data, y, center, k): m = data.shape[1] center_new = np.zeros([k, m]) for i in range(k): index = y == i center_new[i, :] = np.mean(data[index, :], axis=0) if np.all(center_new == center): return center,False else: center = center_new return center,True # Initialization; call generated Center IF __name__ == ' __main__ ' : data = load_iris().data center = initcenter(data,3) k = int (input ( " Please enter the number of centroids: " )) y = np.zeros(len(data)) flag = True while flag: y = xclassify(data, y, center) center, flag = kcmean (data, y, center, k) print ( " cluster result: " , y) for x in range(len(data)): plt.scatter(data[x][0], data[x][1], s=30, c='b', marker='.') for k in range(len(center)): plt.scatter (center [k] [0], center [k] [ 1], s = 60, c = ' r ' , marker = ' D ' ) plt.show()
operation result:
Visualization results:
3) Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot
# -*- coding:utf-8 -*- from sklearn.datasets import load_iris from sklearn.cluster import KMeans import matplotlib.pyplot as plt data_iris = load_iris().data speal_width = data_iris [:, 2 ] X = speal_width.reshape(-1,1) X.shape kmeans_model = KMeans(n_clusters=3) kmeans_model.fit(X) kmeans_model.predict([[3.5]]) y_predict = kmeans_model.predict(X) kmeans_model.cluster_centers_ kmeans_model.labels_ plt.scatter(X[:,0],X[:,0],c=y_predict,s=50,cmap="rainbow") plt.show()
Visualization results:
4) The complete data of iris flowers are clustered and displayed with a scatterplot.
# -*- coding:utf-8 -*- from sklearn.datasets import load_iris from sklearn.cluster import KMeans import matplotlib.pyplot as plt data_iris = load_iris().data kmeans_model = KMeans(n_clusters=3) kmeans_model.fit(data_iris) y_predict1 = kmeans_model.predict(data_iris) kmeans_model.cluster_centers_ kmeans_model.labels_ plt.scatter(data_iris[:,2],data_iris[:,3],c=y_predict1,s=50,cmap="rainbow") plt.show()
Visualization results:
5) Think about what is used in the k-means algorithm?
1. Document Classifier
Divide documents into multiple categories based on tags, topics, and document content. This is a very standard and classic K-means algorithm classification problem. First of all, it is necessary to initialize the documents, express each document as a vector, and use term frequency to identify common terms for document classification. This step is necessary. Then the document vectors are clustered to identify the similarity in the document group.
2. Customer classification
Clustering can help marketers improve their customer base (working in their target area), and further subdivide customer categories based on customer purchase history, interest, or activity monitoring.
3. Team status analysis
Analyzing the status of players has always been a key element in the sports world. As competition becomes more intense, machine learning also plays a vital role in this field. If you want to create an excellent team and like to identify similar players based on player status, then the K-means algorithm is a good choice.
4. Ride data analysis
The data set of Uber ride information open to the public provides us with a lot of valuable data sets on traffic, transit time, and peak ride location. Analyzing these data is not only good for Uber, but also helps us to have an in-depth understanding of the city's traffic patterns to help us plan for the city's future.
5. Cyber analysis criminals
Network analysis is the process of collecting data from individuals and groups to identify important relationships between the two. The network analysis originates from the crime file, which provides information from the investigation department to classify the criminals at the crime scene.
6. Detailed analysis of call records
Call Detail Record (CDR) is the collection of information about user calls, text messages, and network activities by telecommunications companies. Combining detailed call records with customer personal data can help telecommunications companies make more predictions about customer needs.