Machine Learning and -K-means clustering algorithm (based on the R language)

K-means clustering

The n observations point, according to certain standards (data points of similarity), classified to the k clusters (divided user, product category, etc.).

Key Concepts: centroid

K-means clustering variables is a numeric variable requirements, easy to calculate the distance.

 

Algorithm

 

R language

 k-means algorithm is converted to a distance value, and the clustering of distance measurement. No normalization will make the distance is very far away.

Supplementary: scale normalization of significance

Numerical much difference between the two variables, such as age and income value of the difference is very substantial.

step

The first step in determining the number of clusters, i.e., the value of k

Methods: elbow rule + actual business needs

The second step, run the K-means model

The third step summarizes the results of the cluster model

 

Guess you like

Origin www.cnblogs.com/Grayling/p/10991252.html