Unsupervise-learning-notes

其他 2020-02-15 13:01:28 阅读次数: 0

无监督学习笔记

无监督学习主要包括聚类

K-means
数据是没有label的，按照数据之间的相似性进行分类

原理and步骤

是随机选取K个对象作为初始的聚类中心，
计算每个对象与各个种子聚类中心之间的距离，把每个对象分配给距离它最近的聚类中心，
聚类中心以及分配给它们的对象就代表一个聚类，
每分配一个样本，聚类的聚类中心会根据聚类中现有的对象被重新计算。这个过程将不断重复直到满足某个终止条件，
终止条件可以是没有（或最小数目）对象被重新分配给不同的聚类，没有（或最小数目）聚类中心再发生变化，误差平方和局部最小。

数学推导

对于一组没有标签的数据集X
\(X=\left[\begin{array}{c}{x^{(1)}} \\ {x^{(2)}} \\ {\vdots} \\ {x^{(m)}}\end{array}\right]\)
把这个数据集分成\(k\)个簇\(C_{k}\),\(C=C_{1}, C_{2}, \dots, C_{k}\)
最小化的损失函数为，这里使用欧式距离进行度量
\(E=\sum_{i=1}^{k} \sum_{x \in C_{i}}\left\|x-\mu_{i}\right\|^{2}\)
其中\(\mu_{i}\)为簇\(C_{i}\)的中心点：
\(\mu_{i}=\frac{1}{\left|C_{i}\right|} \sum_{x \in C i} x\)
找到最优聚类簇，需要对每一个解进行遍历，因此，k-means使用贪心算法对每个解进行遍历
- 1.在样本中随机选取\(k\)个样本点充当各个簇的中心点\(\left\{\mu_{1}, \mu_{2}, \dots, \mu_{k}\right\}\)
- 2.计算所有样本点与各个簇中心之间的距离 \(\operatorname{dist}\left(x^{(i)}, \mu_{j}\right)\),然后把样本点划入最近的簇中\(x^{(i)} \in \mu_{\text {nearest}}\)
- 3.根据簇中已有的样本点，重新计算簇中心
  \(\mu_{i}:=\partial g(x) 1\left|C_{i}\right| \sum_{x \in C i} x\)
- 重复步骤2，3

猜你喜欢

转载自www.cnblogs.com/gaowenxingxing/p/12311210.html

Unsupervise-learning-notes

Machine Learning Notes PartⅠ

Machine Learning Notes PartⅢ

Machine Learning Notes PartⅡ

Systemtap: learning notes

Notes of《Learning OpenCV 3》

OSCP Learning Notes - Netcat

Node Learning Notes 01

Preface to Deep Learning Notes

CSS Learning Notes （4）

CSS Learning Notes （3）

CSS Learning Notes （2）

CSS Learning Notes （1）

Machine Learning Notes

《TensorFlow+Keras》Learning notes

OSCP Learning Notes - Enumeration(3)

OSCP Learning Notes - Enumeration(4)

OSCP Learning Notes - Kali Linux

OSCP Learning Notes - Enumeration(1)

OSCP Learning Notes - Capstone(1)

OSCP Learning Notes - Enumeration(2)

OSCP Learning Notes - File Transfers

OSCP Learning Notes - Capstone(4)

OSCP Learning Notes - Capstone(2)

OSCP Learning Notes - Capstone(3)

Notes Of Python Learning (Day 1)

OpenCV_Learning_Notes_Primer

Machine Learning Notes Course 3

《PyTorch | MorvanZhou 》learning notes（下）

Notes-Python for Data Science and Machine Learning Bootcamp notes

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)