I. Introduction to Algorithms
- Shift algorithm to find a first center point Center (randomly selected), and then divided according to a range of radius
- To record the number of points within the range of the input plus a cluster c
- Within this range, calculates the average distance to other points of this point, and the average distance as the offset shift
- Moving the center point of the center offset shift units, as the new center point
- Repeat the above steps until the shift is less than the predetermined threshold value, i.e. the convergence
- If the current center distance of the center cluster and another cluster c c2 is less than a certain threshold value, then the current cluster classified as c2, or clustering categories +1
- 1,2,3,4,5,6 Repeat until all points are traversed
- If a point is both traversed cluster c1, c2 clusters have been traversed, put the number of markers is classified as a multi-cluster
The above description is based on mean-shift clustering clustering based on density, samples will belong to the most dense cluster of the class
Second, some calculations
1, the basic shift amount
- S H set of points within the sphere radius
- I.e. with the centroid point in the set obtained by subtracting the cumulative offset
2, the offset Gaussian
- On the basis of the shift amount calculating, within the set range of the center point of the cluster distances have larger weights, it is unreasonable
- Category cluster heart of the closer distance from the cluster center point should be closer to the root, so these points should have a greater weight
// TODO origin of this formula
3, update the new centroid
Three, Code
1 from scipy.spatial import distance 2 from sklearn.neighbors import NearestNeighbors 3 from sklearn.cluster.dbscan_ import DBSCAN 4 from sklearn.cluster.dbscan_ import dbscan 5 import numpy as np 6 from matplotlib import pyplot as plt 7 from sklearn.cluster import MeanShift, estimate_bandwidth 8 9 from sklearn.cluster.tests.common importgenerate_clustered_data 10 . 11 min_samples = 10 12 is EPS = 0.0309 13 is 14 X-= generate_clustered_data (SEED =. 1, n_samples_per_cluster = 1000 ) 15 16 # Quantile control whether from the same class . 17 bandwidth = estimate_bandwidth (X-, Quantile = 0.3, N_SAMPLES = len (X- )) 18 is Meanshift = MeanShift (bandwidth = bandwidth, bin_seeding = True) # building object . 19 meanshift.fit (X-) 20 is Labels = meanshift.labels_ 21 is 22 is Print (np.unique (Labels)) 23 is 24 Fig, AX = PLT. subplots () 25 cluster_num = len (np.unique (Labels)) # number of the label, i.e., the number of automatic division of the group 26 is for I in Range (0, cluster_num): 27 X = [] 28 Y = [] 29 for IND , label in the enumerate (Labels): 30 IF label == I: 31 is x.append (X-[IND] [0]) 32 y.append (X-[IND] [. 1 ]) 33 is ax.scatter (X, Y, =. 1 S ) 34 is 35 plt.show ()
result