13.1 Unsupervised Learning: An Introduction
No sample into different tag set (cluster), this algorithm is called clustering. Common areas have market segmentation, social network analysis, computer cluster management, understand galaxies.
13.2K- means algorithm
(1) K--means clustering algorithm is the most popular, is an iterative algorithm, is assumed to need clustering data into n groups, which first randomly selected time points K, called the cluster center.
Each sample is attributed to the nearest cluster center, then re-calculate the center of each cluster into the new cluster center, repeat the above steps until the cluster center unchanged.
Pseudo-code as follows:
13.3 optimization goals
K- means the minimization problem is that each sample point a corresponding to the distance from the cluster center and:
The difference is that with other algorithms, every k- means iteration will be the cost function becomes smaller.
13.4 random initialization
(1) K m should be less than the number of samples;
(2) Examples of the K randomly selected as the initial cluster centers from the sample.
K- Means may occur cases of local minimum, as follows:
Solution: run the algorithm several times, the last at minimum cost function comparison K- Means result, this method is suitable for K whichever is less time (2-10), K much no apparent effect.
13.5 Select the number of clusters
FIG plotted with number of clusters cost function value of the slope becomes smaller suddenly place ( "elbow rule") and select occur.