CHANG machine study notes -15: Unsupervised Learning: Neighbor Embedding

Neighbor Embedding: dimensionality reduction by non-linear methods to the relationship between the dimension reduction in the original data space from point to point, also called Manifold Learning

Manifold Learning (Manifold Learning)

Manifold: low-dimensional space in a high-dimensional space
in Euclidean space inside, from smaller to adapt to, but once it is not suited to the distance increases, as shown below: In more recent point (blue) we can get the right result, but in the more distant point (red and yellow), European space will be considered closer to red, but it seems it is yellow Manifold closer
Here Insert Picture Description
so Maniflod Learning to do with the dimensionality reduction, go get good results calculated
Here Insert Picture Description
method

1. Locally Liner Embedding(LLE)

xi is a little space, xj is approaching his point, wij relationship between them, xi xj is represented by wij

Here Insert Picture Description

We used to minimize this equation is obtained wij

Here Insert Picture Description

Then we wij remain unchanged, looking for a zi, you can get a zj, to achieve the purpose of dimensionality reduction.

Here Insert Picture Description
This figure tells us to choose the value of k, not all points have a relationship, not too large nor too small, the reduction of dimension k that S = 8, k = 12 effect came good

Here Insert Picture Description

2.Laplacian own Maps

在空间中的两个点的距离关系不能单单只看距离,还要看它们之间有没有high density的关系,所以下面的左图中两个点距离不是按红色虚线,而是按蓝色线进行。
Here Insert Picture Description
把数据看成果然graph,然后根据graph去降维,首先就是要找graph,半监督已经讲这个L后面会加上一个相当于正则项的S(表示i,j有多相近),用在无监督中,我们已经知道了wij,所以要最小化S
Here Insert Picture Description
只要将所有点投影到一个点,S就是0,所以对z做一些限制,比如降至z维后,这些点要将z维空间填满。
Here Insert Picture Description
前面两种方法只能把相近的点归集在一起,但远的点就不能做处理了。

3.t-SNE

t-SNE先定义一个概率分子是i,j的相似度,分母是i跟所有点相似度的和,降维前和降维后的尺度可能不一样,因此进行归一化都转换为几率后规模(scale)就统一了。希望能找到z使得两个分布(xi对于其他点的概率分布和zi 对于其他点的概率分布)越接近越好。
Here Insert Picture Description

t-SNE –Similarity Measure

如果有很多数据点,t-SNE需要逐个点对其他点的进行运算,similarity运算量较大,因此,一般开始会进行降维。
另外,如果有新的数据点加入,不能直接利用t-SNE进行预测,而是需要重新跑一边所有的数据。因此,t-SNE通常用于visualization,显示高维空间的数据在二维空间上的关系。

降维讲similarity计算使用的式子是:Here Insert Picture Description
这里为什么用exp有说过,exp掉得快,距离远的关系就弱。
t-SNE用的是:
Here Insert Picture Description
做了t-SNE后如果相同的会比较近,而不同的会差比较大(拉大距离)
Here Insert Picture Description
直接看个例子:
Here Insert Picture Description

Published 16 original articles · won praise 0 · Views 944

Guess you like

Origin blog.csdn.net/qq_44157281/article/details/98970351