Category hierarchical clustering
- Aggregated hierarchical clustering: from the bottom up; to each object as a cluster, from bottom to top, the similar clusters merged together until combined into a cluster
- Hierarchical clustering classification: top-down; starting from the cluster containing all points, each split a cluster until the cluster, leaving only a single point
Cohesion between clusters
Single chain:
Definition: two clusters of adjacent as the shortest distance between any two points in the two clusters
formula: dist ({m1, m2} , {m3, m4}) = min (dist {m1, m3}, dist {m1, m4 }, dist {m2, m3}, dist {m2, m4})
features: good art single-stranded non-elliptical processing clusters, but very sensitive to noise and outliers.
Full Chain:
Definition: the proximity of two clusters is the longest distance between any two points in the two clusters
formula: dist ({m1, m2} , {m3, m4}) = max (dist {m1 , m3}, dist {m1, m4}, dist {m2, m3}, dist {m2, m4})
features: good processing chain technology circular cluster, but is less sensitive to noise and outliers.
Group average:
Definition: the proximity of two clusters is the average distance between any two points in the two clusters
formula: dist ({m1, m2} , {m3, m4}) = (dist {m1, m3 } + dist {m1, m4} + dist {m2, m3} + dist {m2, m4})
4
Algorithm idea:
Input: n objects, the number of clusters k termination condition
Output: k clusters, the number of clusters reaches a predetermined termination condition
- All objects as an initial cluster
- for(i=1;i≠k;i++) do begin
- Selected cluster C has a maximum diameter in all clusters
- Find other points C with the largest average dissimilarity to a point P and P into splinter group, remaining in the old party
- repeat
- To find the nearest point of the splinter group old party to a point not greater than the distance from the closest point in the old party and the point is added splinter group
- until no new node is assigned the old party to spilnter group
- spilnter group old party and is split into two clusters selected clusters of the new cluster with other clusters set
- end
hierarchical clustering algorithm divided generally less use.