Andrew Ng "machine learning" Course summary (12) _ unsupervised learning

13.1 Unsupervised Learning: An Introduction

No sample into different tag set (cluster), this algorithm is called clustering. Common areas have market segmentation, social network analysis, computer cluster management, understand galaxies.

13.2K- means algorithm

(1) K--means clustering algorithm is the most popular, is an iterative algorithm, is assumed to need clustering data into n groups, which first randomly selected time points K, called the cluster center.

Each sample is attributed to the nearest cluster center, then re-calculate the center of each cluster into the new cluster center, repeat the above steps until the cluster center unchanged.

Pseudo-code as follows:

 

13.3 optimization goals

K- means the minimization problem is that each sample point a corresponding to the distance from the cluster center and:

The difference is that with other algorithms, every k- means iteration will be the cost function becomes smaller.

13.4 random initialization

(1) K m should be less than the number of samples;

(2) Examples of the K randomly selected as the initial cluster centers from the sample.

K- Means may occur cases of local minimum, as follows:

Solution: run the algorithm several times, the last at minimum cost function comparison K- Means result, this method is suitable for K whichever is less time (2-10), K much no apparent effect.

 

13.5 Select the number of clusters

FIG plotted with number of clusters cost function value of the slope becomes smaller suddenly place ( "elbow rule") and select occur.

 

 

Guess you like

Origin www.cnblogs.com/henuliulei/p/11284611.html