Clustering - mean shift

I. Introduction to Algorithms

 

  1. Shift algorithm to find a first center point Center (randomly selected), and then divided according to a range of radius
  2. To record the number of points within the range of the input plus a cluster c
  3. Within this range, calculates the average distance to other points of this point, and the average distance as the offset shift
  4. Moving the center point of the center offset shift units, as the new center point
  5. Repeat the above steps until the shift is less than the predetermined threshold value, i.e. the convergence
  6. If the current center distance of the center cluster and another cluster c c2 is less than a certain threshold value, then the current cluster classified as c2, or clustering categories +1
  7. 1,2,3,4,5,6 Repeat until all points are traversed
  8. If a point is both traversed cluster c1, c2 clusters have been traversed, put the number of markers is classified as a multi-cluster

 

The above description is based on mean-shift clustering clustering based on density, samples will belong to the most dense cluster of the class

 

 

Second, some calculations

 1, the basic shift amount

  • S H set of points within the sphere radius
  • I.e. with the centroid point in the set obtained by subtracting the cumulative offset

2, the offset Gaussian

  • On the basis of the shift amount calculating, within the set range of the center point of the cluster distances have larger weights, it is unreasonable
  • Category cluster heart of the closer distance from the cluster center point should be closer to the root, so these points should have a greater weight


// TODO origin of this formula

 

3, update the new centroid

 

Three, Code

 1 from scipy.spatial import distance
 2 from sklearn.neighbors import NearestNeighbors
 3 from sklearn.cluster.dbscan_ import DBSCAN
 4 from sklearn.cluster.dbscan_ import dbscan
 5 import numpy as np
 6 from matplotlib import pyplot as plt
 7 from sklearn.cluster import MeanShift, estimate_bandwidth
 8 
 9 from sklearn.cluster.tests.common importgenerate_clustered_data
 10  
. 11 min_samples = 10
 12 is EPS = 0.0309
 13 is  
14 X-= generate_clustered_data (SEED =. 1, n_samples_per_cluster = 1000 )
 15  
16  # Quantile control whether from the same class 
. 17 bandwidth = estimate_bandwidth (X-, Quantile = 0.3, N_SAMPLES = len (X- ))
 18 is Meanshift = MeanShift (bandwidth = bandwidth, bin_seeding = True)   # building object 
. 19  meanshift.fit (X-)
 20 is Labels = meanshift.labels_
 21 is  
22 is  Print (np.unique (Labels))
 23 is  
24 Fig, AX = PLT. subplots ()
25 cluster_num = len (np.unique (Labels))   # number of the label, i.e., the number of automatic division of the group 
26 is  for I in Range (0, cluster_num):
 27      X = []
 28      Y = []
 29      for IND , label in the enumerate (Labels):
 30          IF label == I:
 31 is              x.append (X-[IND] [0])
 32              y.append (X-[IND] [. 1 ])
 33 is      ax.scatter (X, Y, =. 1 S )
 34 is  
35 plt.show ()

 

result

 

 

 

 

Guess you like

Origin www.cnblogs.com/ylxn/p/11846184.html