First, the algorithm principle
The cluster distance and density
- Select the starting point, if all the points have been elected as the starting point or have been incorporated into categories, then stop
- The point with the selected distance is less than a certain threshold value into a set of points
- Step 2 If the point number is larger than a certain value, then incorporate it into a category, and then select a center point in this category then proceeds to step 2, otherwise go to step 1
Two, Code
1 from scipy.spatial import distance 2 from sklearn.neighbors import NearestNeighbors 3 from sklearn.cluster.dbscan_ import DBSCAN 4 from sklearn.cluster.dbscan_ import dbscan 5 import numpy as np 6 7 from sklearn.cluster.tests.common import generate_clustered_data 8 9 min_samples = 10 10 eps = 0.0309 11 12 X = generate_clustered_data(seed=1, n_samples_per_cluster=1000) 13 14 D = distance.squareform(distance.pdist(X)) 15 D = D / np.max(D) 16 core_samples, labels = dbscan(D, metric="precomputed", eps=eps, 17 min_samples=min_samples)
Third, the results
There are pictures available of this type of clustering is actually not very good with DBSCAN, as kmeans. Cyclic DBSCAN more suitable clusters, cluster shape or high density