K-Means K-means clustering code for python

The code from Reference:  https://github.com/lawlite19/MachineLearning_Python/blob/master/K-Means/K-Menas.py 

1. Initialize the cluster center, randomly selected from the sample point of K as the initial cluster centers

kMeansInitCentroids DEF (X-, K): 
    m = X.shape [0] 
    m_arr np.arange = (0, m) #. 1-m-generating 0 
    centroids of np.zeros = ((K, X.shape [. 1])) 
    np.random.shuffle (m_arr) # disrupted m_arr order     
    rand_indices = m_arr [: K] # take the first K 
    centroids of X-= [rand_indices ,:] 
    return centroids of

2. Find the distance from which a recent class center of each sample, and returns

findClosestCentroids DEF (X, inital_centroids): 
    m = x.shape [0] # sample number 
    k = inital_centroids.shape [0] # category number 
    dis = np.zeros ((m, k )) # store each point the distance to the k clusters 
    idx = np.zeros ((m, 1 )) # data to return each of which classes 
    
    "" "calculation of each point from the center of each cluster", "" 
    for I in Range (m): 
        for J in Range (K): 
            DIS [I, J] = np.dot ((X [I ,:] - inital_centroids [J,:].) the RESHAPE (. 1, -1), 
                              (X [ . I ,:] - inital_centroids [J,:]) the RESHAPE (-1,1)) 
    '' 'returns dis each row corresponding to the column number of the minimum value, namely the corresponding category 
    - np.min (dis, axis = 1) returns the minimum value of each line 
    - np.where (dis == np.min (dis , axis = 1) .reshape (-1,1)) corresponding to the minimum value of the coordinates returned 
     - Note: the coordinates corresponding to the minimum possible a plurality, WHERE will find out, is returned to the first m time required to return (as a minimum value for a plurality,  
     which classes are can)
    ''' 
    dummy,idx = np.where(dis == np.min(dis,axis=1).reshape(-1,1))
    return idx[0:dis.shape[0]]   

3. Update class center

computerCentroids DEF (X, IDX, K): 
    n-x.shape = [. 1] # for each sample dimension 
    centroids = np.zeros ((k, n )) # define the shape of each of the center point, and wherein each of the dimensions as the dimensions of the samples 
    for I in Range (K): 
        # If a dimensional index, axis = 0 for each column, idx == i find out which type of time, and then calculate the mean 
        centroids [i ,:] = np .mean (X [np.ravel (IDX == I) ,:], Axis = 0) .reshape (. 1, -1) 
    return centroids of

4. K-Means algorithm

runKMeans DEF (X, initial_centroids, max_iters, plot_process): 
    m, and n-dimensional x.shape # = the number of samples 
    k = initial_centroids.shape [0] # Number class clustering 
    centroids = initial_centroids # current category records center 
    previous_centroids = recorded on a class centroids # center 
    idx = np.zeros ((m, 1 )) # belong to which category each of the data 
    
    for I in Range (max_iters): 
        Print ( "number of times an iterative calculation:% d"% (i + . 1)) 
        IDX = findClosestCentroids (X, centroids of) 
        IF plot_process: if the rendered image # 
            plt = plotProcessKMeans (X, centroids, previous_centroids, movement idx) # Videos cluster centers 
            previous_centroids = centroids # reset 
            plt.show () 
        centroids of = computerCentroids (x, idx, k ) # cluster centers recalculated
    return centroids, idx # return clustering and data centers which category 

The cluster center drawing movement

plotProcessKMeans DEF (X-, centroids of, previous_centroids, IDX): 
    for I in Range (len (IDX)): 
        IF IDX [I] == 0: 
            plt.scatter (X-[I, 0], X-[I,. 1], c = "r") # original data form of two-dimensional scattergram 
        elif IDX [I] ==. 1: 
            plt.scatter (X-[I, 0], X-[I,. 1], C = "B") 
        the else : 
            plt.scatter (X-[I, 0], X-[I,. 1], C = "G") 
    plt.plot (previous_centroids [:, 0], previous_centroids [:,. 1], 'RX', 10 = markersize , linewidth on a cluster centers = 5.0 #) 
    plt.plot (centroids of [:, 0], centroids of [:,. 1], 'RX', markersize = 10, linewidth = 5.0) # current cluster center 
    for j in range (centroids.shape [0]): # traversing each class, draw a straight line moving cluster center 
        P1 = centroids of [J ,:] 
        P2 = previous_centroids [J ,:] 
        plt.plot ([p1 [0], p2 [0]], [p1 [1], p2 [1]], " -> ", linewidth = 2.0) 
    return Plt

6. The main program realization

if __name__ == "__main__":
    print("聚类过程展示....\n")
    data = spio.loadmat("./data/data.mat")
    X = data['X']
    K = 3 
    initial_centroids = kMeansInitCentroids(X,K)
    max_iters = 10 
    runKMeans(X,initial_centroids,max_iters,True) 

7. results

Clustering process show .... 

iterative calculation: 1

The number of iterations is calculated: 2

Calculate the number of iterations: 3

Calculate the number of iterations: 4

Calculate the number of iterations: 5

Calculate the number of iterations: 6

Calculate the number of iterations: 7

Calculate the number of iterations: 8

Calculate the number of iterations: 9

The number of iterations is calculated: 10

 
 

 

Guess you like

Origin www.cnblogs.com/carlber/p/11781503.html