Machine learning - clustering - density clustering algorithm notes

Density clustering: 1.DBSCAN 2. Algorithm maximum density

Guiding ideology density clustering method is that, as long as the high density of sample points than a threshold value, then add the sample to the nearest cluster.

Such an algorithm can overcome the disadvantages found in the distance-based clustering algorithm "round" (convex) can be found in clusters of any shape, and is not sensitive to noisy data. However, the density calculating unit large computational complexity, the need for spatial index to reduce the amount of calculation.

DBSCAN(Density-Based Spatial Clustering of Applications with Noise)

A more representative clustering algorithm based on density. And dividing and hierarchical clustering methods, the maximum set point is defined as the density of clusters it is connected, it is possible to have a sufficiently high density area division of clusters, and may have any shape data clustering "noise" found .

  ε- neighborhood objects: an object region within a given radius ε.

  Core objects: For a given number m, [epsilon] If a neighborhood object contains at least m objects, the object is called the object core.

 direct density up: Given a set of objects D, if p is the ε- neighborhood of q, and q is a core object, we say that the object from the object p and q is directly density reachable.

 FIG ε = 1cm, m = 5, q is a core object, the object from the object p to q is a direct density-reachable.

  

  Density of up: If there is a chain of objects P . 1 P 2 ... P n- , P . 1 = Q, P n- = P, to P I ∈D, (1≤i ≤n), P I + 1'd from about pi ε and m are density-reachable directly, the object P I +. 1 from the object on ε q and m are density-reachable.

 connected Density: If there is a collection of objects in an object D o, p and q are such that objects on the ε o and m are density-reachable, then the object on ε p and q and m are connected to the density.

 Cluster: cluster density is based on a collection of connected objects of maximum density.

 noise: not included in any cluster of objects called noise

  

  DBSCAN algorithmic process:

 If a neighborhood of the point p ε- objects comprise more than m, p create a new cluster as a core object;

 find and merge core target object directly density reachable;

 No new points can be updated cluster algorithm ends.

  From the above algorithm:

 Each cluster contains at least a core object;

 non-core objects may be part of clusters, cluster constitutes edge (Edge);

 object contains too little cluster is considered to be noise

Maximum density clustering

Maximum density clustering algorithm is a simple, elegant clustering, cluster type may be identified in various shapes, and the parameters can be easily determined.

Definition: local density [rho] I , cutoff:

D C is a cut away, [rho] i ie the distance to the subject is less than D i C number of objects. Since the algorithm is only sensitive to the relative value of ρi, so the D C selected sound is a recommended practice is to choose D C , such that the average number of neighbors of each point is 1% -2% of all points

    Gaussian kernel similarity:

        

   K neighbors mean:

  

Definition: high local density point distance [delta] I

      

In all the above object density i of the object, the object i to the closest distance, i.e., a high local density point distance.
. 1 maximum density of the object, provided Ge [delta] I = max (D ij of ) (that is: the problem of infinity).
2. Only those who are local or global maximum density point will have a far greater than normal high local density point distance.

Recognition cluster center

1) those with large local density ρi and a lot of high-density point distance δi is considered to be the center of the cluster;
; 2) High-density greater distance δi but local density ρi minor point is the outlier
after  determine the cluster center other points according to the distance of the center of the nearest known cluster classification
 Note: the method can also be classified according to density-reachable.

DensityPeak the decision diagram Decision Graph

Left is distributed in two-dimensional space of all points, ρ as abscissa is the right to δ decision as ordinate plotted in FIG. Can be seen, ρ 1 and 10 two points i and [delta] i are large, as the center point of the cluster. 26,27,28 three points [delta] I is relatively large, but [rho] I is small, it is an abnormal point.

Heavy recognized borders and noise

1) In the cluster analysis, often necessary to determine the reliability of each point allocated to a cluster:

2) In this algorithm, each cluster may be defined as a first boundary region (border region), i.e. allocated to the cluster but other clusters distance point is less than D C set point. Then find its point of maximum local density boundary area for each cluster, so that it is the local density [rho] H .

3) the density is greater than all of the local cluster [rho] H point is considered to be part of the core of the clusters (i.e. the points allocated to the reliability of such large clusters), the remaining points are considered to be class cluster halo ( halo), i.e. it can be considered noise.

 Note: For reliability issues in the EM algorithm involved will still be relevant.

Affinity Propagation: AP clustering algorithm , you can take a look at this article.

Guess you like

Origin www.cnblogs.com/yang901112/p/11615631.html