1 Unsupervised Learning

无标签数据
聚类（clustering）：点集，就是聚在一起的点组成的一个小部落

2 K-Means Algorithm（K-均值算法）

iterate algorithm

cluster assignment（簇分配）
move centroid（移动聚类中心）

Randomly initialize $K$ cluster centroids $\mu_1,\mu_2,···,\mu_K∈\mathbb{R}^n$
Repeat{
(1) Cluster assignment step
for $i = 1$ to $m$
$c^{(i)}$ := index ( from 1 to $K$ ) of cluster centroid closest to $x^{(i)}$
$c^{(i)}=\mathop{\text{min}}\limits_{k}{||x^{(i)}-\mu_k||}^2$
(2) Move centroid step
for $k = 1$ to $K$
$\mu_k$ := average ( mean ) of points assigned to cluster $k$
}

3 Optimization Objective 优化目标

$c^{(i)}$ = index of cluster ( $1, 2, \cdot \cdot \cdot, K$ ) to which example $x^{(i)}$ is currently assigned 当前样本 $x^{(i)}$ 所属的那个簇的索引或引号
$\mu_k$ = cluster centroid $k$ ( $\mu_k∈\mathbb{R}^n$ ) 第 $k$ 个聚类中心的位置
$\mu_{c^{(i)}}$ = cluster centroid of cluster to which example $x^{(i)}$ has been assigned $x^{(i)}$ 所属的那个簇的聚类中心

Distortion Function 畸变函数：
$\mathop{\text{min}}\limits_{\begin{aligned}c^{(i)},···,c^{(m)}\\\mu_1,···,\mu_K\ \ \end{aligned}} J(c^{(1)},···,c^{(m)},\mu_1,···,\mu_K)=\frac{1}{m}\sum_{i=1}^m{||x^{(i)}-\mu_{c^{(i)}}||}^2$

4 Random Initialization 随机初始化

Should have $K < m$ 聚类中心点的个数 $<$ 所有训练集实例的数量
Randomly pick $K$ training examples 随机选择 $K$ 个训练实例
Set $\mu_1,···,\mu_K$ equal to these $K$ examples 令 $K$ 个聚类中心分别与这 $K$ 个训练实例相等

为了解决局部最小值问题，若 $K$ 较小，需要多次运行 $K - 均值$ 算法，每一次都重新进行随机初始化，最后再比较多次运行的结果，选择代价函数最小的结果

for i =1 to 100 {
Randomly initialize K-means.
Run K-means. Get $c^{(i)},···,c^{(i)},···,\mu_1,···,\mu_K$
Compute cost function (distortion) $J(c^{(1)},···,c^{(m)},\mu_1,···,\mu_K)$
}
Pick clustering that gave lowest cost $J(c^{(1)},···,c^{(m)},\mu_1,···,\mu_K)$

5 Choosing the Number of Clusters

Elbow method 肘部法则

6 Reference

吴恩达机器学习 coursera machine learning
黄海广机器学习笔记

【机器学习】8 聚类

第8章聚类

1 Unsupervised Learning

2 K-Means Algorithm（K-均值算法）

3 Optimization Objective 优化目标

4 Random Initialization 随机初始化

5 Choosing the Number of Clusters

6 Reference

猜你喜欢

【机器学习】8 聚类

第8章 聚类

1 Unsupervised Learning

2 K-Means Algorithm（K-均值算法）

3 Optimization Objective 优化目标

4 Random Initialization 随机初始化

5 Choosing the Number of Clusters

6 Reference

猜你喜欢

第8章聚类