Simple and crude understanding and implementation of machine learning clustering algorithm (III): clustering algorithm implementation process, k-means clustering step, case

Clustering Algorithm

learning target

  • Master clustering algorithm implementation process
  • We know K-means algorithm theory
  • We know evaluation model clustering algorithm
  • The advantages and disadvantages of K-means
  • Understand way clustering algorithm optimization
  • Application Kmeans achieve clustering task
    Here Insert Picture Description

6.3 clustering algorithm implementation process

k-means actually contains two elements:

K: The number of initial central point (plan number of clusters)

means: to find the center point of the other data points from the average value of

1 k-means clustering step

  • 1, a random set point in feature space as the K initial cluster centers
  • 2, the distance K is calculated for each of the other central point, unknown point select the nearest cluster center as a marker category
  • 3, then after the cluster center against the tag, a recalculation of the center point of each cluster (average)
  • 4, if the calculation results with the original new center point as the center point (centroid no longer moving), then ends, otherwise re-Step Process

FIG explained achieved by the following process:

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-I3UuugF3-1583251071157) (../ images / K-means% E8% BF% 87% E7% A8% 8B% E5% 88% 86% E6% 9E% 90.png)]

FIG dynamic effects clustering k

Here Insert Picture Description
Here Insert Picture Description

2 Case Exercises

  • Case:

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-01TMWxaF-1583251071159) (../ images / kmeans_demo1.png)]

  • 1, a random set point in feature space as the K initial cluster centers (in this case provided p1 and p2)

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-YgmP3NXH-1583251071160) (../ images / kmeans_demo2.png)]

2, the distance K is calculated for each of the other central point, unknown point select the nearest cluster center as a marker category

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-EKP5oNr1-1583251071160) (../ images / kmeans_demo3.png)]

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-QmDvqE6m-1583251071161) (../ images / kmeans_demo4.png)]

3, then after the cluster center against the tag, a recalculation of the center point of each cluster (average)

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-elAlGb9X-1583251071161) (../ images / kmeans_demo5.png)]

4, if the calculated new center point as the original center point (centroid no longer moving), then ends, otherwise re-Step Process [after judgment, need to repeat the steps above to start a new round of iteration]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YAbIyYrQ-1583251071162)(../images/kmeans_demo6.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-K1Femhd1-1583251071162)(../images/kmeans_demo7.png)]

5, when the same result each iteration that convergence, clustering is completed, K-Means will be stopped, can not fall into the election process has been the center of mass.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H7VyZbqi-1583251071164)(../images/kmeans_demo8.png)]

3 Summary

Process :

  • Pre- determined constant K , K means constant final number of clusters category;
  • First randomly selected as the initial point of the centroid , and by calculating a similarity between each sample and the centroid (here, the Euclidean distance), the normalized sample points to the most similar class,
  • Next, recalculated each cluster centroid (i.e. cluster center), repeat this process until the centroids no longer change ,
  • Ultimately determine the category of each sample belongs and the centroid of each class.

Note :

  • Because every time the calculation of similarity between the sample and the centroid of each, so that on large data sets, K-Means algorithm convergence speed slower.
Published 627 original articles · won praise 839 · views 110 000 +

Guess you like

Origin blog.csdn.net/qq_35456045/article/details/104644999