Clustering Algorithm
Article Directory
learning target
- Master clustering algorithm implementation process
- We know K-means algorithm theory
- We know evaluation model clustering algorithm
- The advantages and disadvantages of K-means
- Understand way clustering algorithm optimization
- Application Kmeans achieve clustering task
6.3 clustering algorithm implementation process
k-means actually contains two elements:
K: The number of initial central point (plan number of clusters)
means: to find the center point of the other data points from the average value of
1 k-means clustering step
- 1, a random set point in feature space as the K initial cluster centers
- 2, the distance K is calculated for each of the other central point, unknown point select the nearest cluster center as a marker category
- 3, then after the cluster center against the tag, a recalculation of the center point of each cluster (average)
- 4, if the calculation results with the original new center point as the center point (centroid no longer moving), then ends, otherwise re-Step Process
FIG explained achieved by the following process:
FIG dynamic effects clustering k
2 Case Exercises
- Case:
- 1, a random set point in feature space as the K initial cluster centers (in this case provided p1 and p2)
2, the distance K is calculated for each of the other central point, unknown point select the nearest cluster center as a marker category
3, then after the cluster center against the tag, a recalculation of the center point of each cluster (average)
4, if the calculated new center point as the original center point (centroid no longer moving), then ends, otherwise re-Step Process [after judgment, need to repeat the steps above to start a new round of iteration]
5, when the same result each iteration that convergence, clustering is completed, K-Means will be stopped, can not fall into the election process has been the center of mass.
3 Summary
Process :
- Pre- determined constant K , K means constant final number of clusters category;
- First randomly selected as the initial point of the centroid , and by calculating a similarity between each sample and the centroid (here, the Euclidean distance), the normalized sample points to the most similar class,
- Next, recalculated each cluster centroid (i.e. cluster center), repeat this process until the centroids no longer change ,
- Ultimately determine the category of each sample belongs and the centroid of each class.
Note :
- Because every time the calculation of similarity between the sample and the centroid of each, so that on large data sets, K-Means algorithm convergence speed slower.