sklearn of clustering evaluation index profile coefficient ---

'' ' 
    Contour coefficient: evaluation index clustering ----- 
            good clustering: the dense outer sparse samples of the same cluster to be dense enough inside, samples between different clusters to be distant enough. 

            Profile coefficient calculation rule: for a sample space in a particular sample, it is calculated where the cluster with other samples average distance a, 
            and another sample with the nearest cluster all samples the average distance b, the profile of the sample coefficient (ba) / max (a, b), 
            the entire sample space contour coefficient arithmetic mean of all the samples taken, a division clustering performance index s. 

            Contour interval coefficients is: [- 1, 1]. -1 for classification of bad, good 1 represents the classification results. 0 represents the cluster overlap, there is no good CLASSIFICATION. 

            Coefficients associated contour the API: 
                Import sklearn.metrics SM AS 
                # V: average profile factor 
                # metric: Distance Algorithm: Euclidean distance (Euclidean) 
                V = sm.silhouette_score (input set, set output, sample_size = number of samples, Metric = distance algorithm) 

    case: coefficient of output profile KMeans clustering algorithm divided. 
'' ' 

Import numpy AS NP
 Import matplotlib.pyplot AS MP
Import sklearn.cluster SC AS
 Import sklearn.metrics SM AS 

# to read data, the image drawing 
X = np.loadtxt ( ' ./ml_data/multiple3.txt ' , the unpack = False, DTYPE = ' F8 ' , DELIMITER = ' , ' )
 print (x.shape) 

# based clustering Kmeans complete 
Model = sc.KMeans (= n_clusters. 4 ) 
model.fit (X)   # complete cluster 
pred_y = model.predict (X)   # forecast the point at which the cluster 
print (pred_y)   # output of each sample cluster labels 
# print profile coefficients 
print(sm.silhouette_score (X, pred_y, SAMPLE_SIZE = len (X), Metric = ' Euclidean ' ))
 # acquires cluster centers 
Centers = model.cluster_centers_
 Print (Centers) 

# draw the boundary line classification 
l, r = x [:, 0] .min () -. 1, X [0 :,] .max () +. 1 
B, T = X [:,. 1] .min () -. 1, X [:,. 1] .max () +. 1 
n- = 500 
grid_x, grid_y = np.meshgrid (np.linspace (L, R & lt, n-), np.linspace (B, T, n-)) 
bg_x = np.column_stack ((grid_x.ravel (), grid_y.ravel ( ))) 
bg_y = model.predict (bg_x) 
grid_z = bg_y.reshape (grid_x.shape) 

# drawing shows a sample data
mp.figure('Kmeans', facecolor='lightgray')
mp.title('Kmeans', fontsize=16)
mp.xlabel('X', fontsize=14)
mp.ylabel('Y', fontsize=14)
mp.tick_params(labelsize=10)
mp.pcolormesh(grid_x, grid_y, grid_z, cmap='gray')
mp.scatter(x[:, 0], x[:, 1], s=80, c=pred_y, cmap='brg', label='Samples' ) 
Mp.scatter (Centers [:, 0], Centers [:, . 1], S = 300, Color = ' Red ' , marker = ' + ' , label = ' Cluster Center ' ) 
mp.legend () 
MP. show () 


output: 
( 200, 2 ) 
[ . 1. 1 0 2. 1. 3 0 2. 1. 3 0 2 1,302,130,213,023,302,130 2. 1. 3 0 2. 1 
 . 3 0 2. 1. 3 0 2. 1. 3 0 2 1,302,130,213,021,302,130 2. 1. 3 0 2. 1. 3 
 0 2. 1. 3 0 2. 1. 3 0 2. 1. 3 0 2 1,302,130,203,021,302,130 2. 1. 3 0
  2. 1. 3 0 2. 1. 3 0 2,130,213,021,302,130,213 0213021302 
 . 1. 1 0 2. 1. 3 0 2 1,303,130,213,021,302,130 2. 3. 1 0 2 0 2. 1. 1. 3 
 . 3. 3. 1 0 2 0 2 0 2. 3. 1. 3. 1 0 2]
0.5773232071896659
[[5.91196078 2.04980392]
 [1.831      1.9998    ]
 [7.07326531 5.61061224]
 [3.1428     5.2616    ]]

  

Guess you like

Origin www.cnblogs.com/yuxiangyang/p/11220206.html