sklearn level of clustering algorithm based on cohesion

'' ' 
    Agglomerative hierarchical algorithm: First, assume that each sample is a separate cluster, if the statistics out of the number of clusters larger than expected number of clusters, from each sample off to find the nearest another sample, 
                and the aggregate to form larger clusters, bringing the total number of clusters while reducing, repeat the above process until figured out the number of clusters reaches the desired value. 

            Agglomerative hierarchical algorithm features: 
                1. The number of clusters k must be known in advance. Some means of evaluation index, the best number of clusters preferable. 
                2. There is no concept of cluster centers, and therefore can only be divided into clusters in the training set, but can not determine the ownership of the unknown sample clustering outside the training set. It can not be predicted. 
                3. In determining the agglomerated sample, in addition to the distance as a condition, may also be aggregated to determine the sample according to the continuity. 

            Agglomerative hierarchical algorithm related API: 
                # agglomerative hierarchical clustering device 
                Model = sc.AgglomerativeClustering (n_clusters = 4) 
                pred_y = model.fit_predict (the X-) Returns the value of the current sample # Category 

    Case: Reload multiple3.txt, using agglomerative hierarchical algorithm clustering division. 

'' ' 

Import numpy AS NP
 Import matplotlib.pyplot AS MP
 ImportSC AS sklearn.cluster 

# reading data, the image drawing 
X = np.loadtxt ( ' ./ml_data/multiple3.txt ' , the unpack = False, DTYPE = ' F8 ' , DELIMITER = ' , ' )
 Print (x.shape) 

# based clustering Agglomerativeclustering complete 
Model = sc.AgglomerativeClustering (= n_clusters. 4 ) 
pred_y = model.fit_predict (X)
 Print (pred_y) 

# drawing shows a sample data 
mp.figure ( ' Agglomerativeclustering ' , facecolor = ' LightGray ' ) 
mp.title ( 'Agglomerativeclustering', fontsize=16)
mp.xlabel('X', fontsize=14)
mp.ylabel('Y', fontsize=14)
mp.tick_params(labelsize=10)
mp.scatter(x[:, 0], x[:, 1], s=80, c=pred_y, cmap='brg', label='Samples')
mp.legend()
mp.show()


输出结果:

(200, 2)
[1 1 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1
 3 0 2 1 3 0 2 1 3 0 2 1 3 0 0 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 1
 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 0 3 0 2 1 3 0 2 1 3 0 2 1 3 0
 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1 1 0 2
 1 1 0 2 1 3 0 2 1 3 0 3 1 3 0 2 1 3 0 2 1 1 0 2 1 3 0 2 1 3 0 2 1 3 0 2 1
 3 0 2 1 3 0 2 1 3 0 2 1 3 0 2]

  

Guess you like

Origin www.cnblogs.com/yuxiangyang/p/11220185.html