2020.4.14 K-means algorithm

1). The k-means clustering process of manual playing of playing cards:> 30 cards, 3 types

2). * K-means algorithm is independently written, clustering is performed on the iris petal length data, and displayed with a scatterplot. (Plus points)

3). Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot.

4). Complete data of iris flowers are clustered and displayed with a scatterplot.

5). Think about what is used in the k-means algorithm?

answer:

1) The k-means clustering process of manual rehearsal for playing cards:> 30 cards, 3 types

Randomly select three centers (8, 3, 2);

 

After the first step, it becomes (9, 4, 2);

 

After the second step, it becomes (10, 6, 2), which has not changed, and the cluster center is (10, 6, 2);

 

2) Write the K-means algorithm independently, cluster with the iris petal length data, and display it with a scatterplot.

 

# -*- coding:utf-8 -*-

from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Select the K space data objects as the initial center, each representing a cluster centers; 
DEF initcenter (Data, K):
    m = data.shape [. 1]   # samples the number of attributes 
    Center = []
    temp = 0
    temptemp = 0
    for i in range(k):
        temp = np.random.randint(0, len(data))
        if temp != temptemp:
            center.append(data[temp, :])
            temptemp = temp
         else :
             print ( " repeat " )
            i -= 1
    center = np.array(center)
    return center

def nearest(center, x):
    distance = []
    for j in range(k):
        distance.append(sum((x - center[j,:]) ** 2))
    # print(np.argmin(distance))
    y = np.argmin(distance)
    return y

def xclassify(data, y, center):
    for index,x in enumerate(data):
        y[index] = nearest(center, x)
    return y

def kcmean(data, y, center, k):
    m = data.shape[1]
    center_new = np.zeros([k, m])
    for i in range(k):
        index = y == i
        center_new[i, :] = np.mean(data[index, :], axis=0)
    if np.all(center_new == center):
        return center,False
    else:
        center = center_new
        return center,True

# Initialization; call generated Center 
IF  __name__ == ' __main__ ' :
    data = load_iris().data
    center = initcenter(data,3)
    k = int (input ( " Please enter the number of centroids: " ))
    y = np.zeros(len(data))
    flag = True
    while flag:
          y = xclassify(data, y, center)
          center, flag = kcmean (data, y, center, k)
     print ( " cluster result: " , y)

    for x in range(len(data)):
        plt.scatter(data[x][0], data[x][1], s=30, c='b', marker='.')
    for k in range(len(center)):
        plt.scatter (center [k] [0], center [k] [ 1], s = 60, c = ' r ' , marker = ' D ' )
    plt.show()

operation result:

 Visualization results:

 

3) Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot

 

# -*- coding:utf-8 -*-

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
data_iris = load_iris().data

speal_width = data_iris [:, 2 ]
X = speal_width.reshape(-1,1)
X.shape
kmeans_model = KMeans(n_clusters=3)
kmeans_model.fit(X)
kmeans_model.predict([[3.5]])
y_predict = kmeans_model.predict(X)
kmeans_model.cluster_centers_
kmeans_model.labels_
plt.scatter(X[:,0],X[:,0],c=y_predict,s=50,cmap="rainbow")
plt.show()

 Visualization results:

 

4) The complete data of iris flowers are clustered and displayed with a scatterplot.

# -*- coding:utf-8 -*-

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
data_iris = load_iris().data

kmeans_model = KMeans(n_clusters=3)
kmeans_model.fit(data_iris)
y_predict1 = kmeans_model.predict(data_iris)
kmeans_model.cluster_centers_
kmeans_model.labels_
plt.scatter(data_iris[:,2],data_iris[:,3],c=y_predict1,s=50,cmap="rainbow")
plt.show()

 Visualization results:

 

5) Think about what is used in the k-means algorithm?

1. Document Classifier

  Divide documents into multiple categories based on tags, topics, and document content. This is a very standard and classic K-means algorithm classification problem. First of all, it is necessary to initialize the documents, express each document as a vector, and use term frequency to identify common terms for document classification. This step is necessary. Then the document vectors are clustered to identify the similarity in the document group.

  2. Customer classification

  Clustering can help marketers improve their customer base (working in their target area), and further subdivide customer categories based on customer purchase history, interest, or activity monitoring.

  3. Team status analysis

  Analyzing the status of players has always been a key element in the sports world. As competition becomes more intense, machine learning also plays a vital role in this field. If you want to create an excellent team and like to identify similar players based on player status, then the K-means algorithm is a good choice.

  4. Ride data analysis

  The data set of Uber ride information open to the public provides us with a lot of valuable data sets on traffic, transit time, and peak ride location. Analyzing these data is not only good for Uber, but also helps us to have an in-depth understanding of the city's traffic patterns to help us plan for the city's future.

  5. Cyber ​​analysis criminals

  Network analysis is the process of collecting data from individuals and groups to identify important relationships between the two. The network analysis originates from the crime file, which provides information from the investigation department to classify the criminals at the crime scene.

   6. Detailed analysis of call records

  Call Detail Record (CDR) is the collection of information about user calls, text messages, and network activities by telecommunications companies. Combining detailed call records with customer personal data can help telecommunications companies make more predictions about customer needs.

 

Guess you like

Origin www.cnblogs.com/Azan1999/p/12695951.html