K-means clustering
The n observations point, according to certain standards (data points of similarity), classified to the k clusters (divided user, product category, etc.).
Key Concepts: centroid
K-means clustering variables is a numeric variable requirements, easy to calculate the distance.
Algorithm
R language
k-means algorithm is converted to a distance value, and the clustering of distance measurement. No normalization will make the distance is very far away.
Supplementary: scale normalization of significance
Numerical much difference between the two variables, such as age and income value of the difference is very substantial.
step
The first step in determining the number of clusters, i.e., the value of k
Methods: elbow rule + actual business needs
The second step, run the K-means model
The third step summarizes the results of the cluster model