Kmeans is a relatively simple algorithm in the clustering algorithm, and it is also used a lot. Here is a simple explanation, the main purpose is to record it for yourself for later review.
The main idea of K-means clustering is to make the points of each class as close as possible to the cluster center.
The algorithm for K-means clustering can be described as:
Input: dataset
Number of clusters: k
algorithm:
- Randomly select k samples from dataset D as initial cluster centers
- repeat
- make
- for j = 1,2,…,m do
- for l = 1,2,…,n
- Calculate the distance between sample l and each cluster center, and add sample l to the set to which the nearest cluster center belongs
- end for
- end for
- Recalculate the center position of all clusters, that is, the mean vector
- until the cluster centers no longer change (sometimes the convergence time is very long, a maximum number of repetitions may be set, or a threshold for the change of cluster centers may be set )
output: clusters