"Machine learning" Chapter X measure learning and dimensionality reduction

10.1 k nearest neighbor learning

Content: was predicted according to the principle of majority rule by the label is predicted point around the point.

Advantages: The method is simple, and the error rate is twice the Bayes.
Disadvantages: 1 susceptible extreme points; 2 results difficult to interpret.

Learning place:

· "For any x and any small integer δ, in the near distance x δ can always find a training sample" means "always find a solution extremely similar to x." To obtain a solution that satisfies certain conditions limit the use of thought, indicating the presence.
· Targeted along the derivation error Bayes classification, using the squared difference equation conversion, and (1 + p) <= 2, etc. The method of approximation conversion.

Dimensionality reduction learning differences reflect the different elements of the need to keep.

10.2 low dimensional embedding

Content: How while retaining important information, extracted low-dimensional features.

MDS (Multiple Dimensional Scaling, "multidimensional scaling"):

The core idea: the level of dimension Euclidean distance remains the same, how to represent low-dimensional feature by the same Euclidean distance.

Derived as follows:
Specific derivation
the practical application: the use of high dimensional information calculated from A, B, C, and d, then the introduction of B, the final matrix B do eigenvalue decomposition, B = VUV ᐪ, to obtain the diagonal matrix U eigenvalues configuration, and eigenvector matrix V, and thus to give FROM = The 1 / 2 V {Z=U\mathop{{}}\nolimits^{{1/2}}{Vᐪ}} (in reality, the Euclidean distance does not necessarily require the same, can achieve similar, sometimes reduced dimensions so much smaller than the original dimension may be)

It is important to note that point: in fact, the number can be reduced to dimensions, depending on the number of nonzero matrix B features eigenvalue decomposition. (In certain cases, the effect of reduction can not be achieved even dimension, the specific implementation requires data to be analyzed for a given specific)

低维嵌入总结:低维嵌入的实质就是在保持高维情况中欧式距离不变的情况下,利用不变的欧式距离得到低维情况下的结果。

10.3 线性降维

核心思想:Z=WᐪX,利用变换矩阵W,对原始高维空间进行线性变换,得到的新属性是原空间中属性的线性组合。

主成分分析(Principal Component Analysis ,简称PCA)

预备知识:坐标转换:使用基向量A,B来表示某向量R,R=xA+yB,那么x=AᐪR,y=BᐪR,
Here Insert Picture Description
得到x,y之后,欲根据x,y进行R的重构,是需要将两者求和滴~~(x,y在A,B产生的坐标系中,均只有一个维度不为0,而R全部都不为0)

核心思想:1.最近重构性 2.最大可分性

最近重构性:将重构得到的x与原x之间的距离最小化作为目标函数。

公式推导如下Here Insert Picture Description
该结果对应的就是Z的所有元素之和,即Z的F范数,进而根据F范数就是矩阵的对角元素元素的迹即得到(10.15)。

ps.通过矩阵的维度来检验和计算能较快的缕清整个过程,并保证准确率。

最大可分性:投影得到的样本点之间的距离最大Here Insert Picture Description

协方差矩阵:

Here Insert Picture Description
协方差矩阵的意义是说明两组数据之间的关系:
从直观上看:如果x与x’总是大于(小于)均值,那么得到的协方差就就会大于0;如果总是一个大于均值,另一个小于均值,那么得到的协方差就会小于0;如果两者与均值之间的关系一直在变化,那么协方差就会不断的正负变化。
从宏观上看:如果两者的变化趋势相同,即两者同时大于(小于)均值,那么协方差就会大于0;如果两者的变化趋势相反,即一者大于均值时,另一者小于均值,协方差就会小于0;如果两者相互独立,那么协方差就等于0,反之不然。

ps.此处的数据都是经过中心化的,中心化可以利用每个元素均减去均值的方式实现,因此只要将对应元素相乘即可。

即最终只要利用 1 m FROM FROM {\frac{{1}}{{m}}{Zᐪ}Z} 就能得到对应的协方差矩阵,可以推出 M a x    t r ( W X X W ) {Max\text{  }tr \left( {Wᐪ}X{Xᐪ}W \right) } 实质上与最近重构等价。

使用拉格朗日乘子法,对协方差矩阵 X X {X{Xᐪ} } 求进行特征值分解,并对其进行排序,取前较大的d’个特征值对应的特征向量作为主成分分析的解 ,即完成降维。


对于d’值的选择有以下两种方法:
1.通过k近邻分类器的分类准确率
2.通过最小重构的除法,将重构结果除以原结果大于某个阈值时的最小d’

主成分分析(PCA)总结:主成分分析的实质就是通过最小化降维结果与原数据的差距或者最大化降维结果的可分度来得到对应的矩阵,进而选出最大的特征值对应的特征向量作为包含最多信息的“主成分”。

10.4 核化线性降维(本处以核主成分分析为例,KPCA)

先解释一下图10.6的意思:从(b)中按照S形,得到(a),但是对于(a),如果进行PCA,那么得到的结果是(c),而不是(b),原因就是因为由(a)到(c)默认为高维到低维的映射是线性的。

核心思想:高维基向量可以被高维的所有样本点共同表示

如果已知低维到高维的映射函数(b到a的映射函数已知),那么就只需要在高维样本点(a)使用PCA降维就能完成低维W(基向量)的寻找;但是通常情况下,映射函数是未知的,那么我们可以使用一个特性:高维的基向量可以利用所有的样本点进行表示(10.22)。再代入PCA进行计算,就能得到(10.24),其中K是高维中的核矩阵,通过特征值分解,就能得到对应的系数α(低维基向量)【事实上,最后只需要根据K中较大特征值对应的特征向量即可】。最后,对于新的样本,可以通过(10.25)完成高维到低维的转换。
参考文章

10.5 流形学习

核心思想:高维复杂的图形中的局部信息在低维中保持不变。

10.5.1 等度量映射(Isometric Mapping ,简称Isomap)

核心思想:使用近邻距离(不变量)的不断叠加结合MDS逐步实现距离的度量。

测地线距离:沿着图形的线段距离 直线距离:两点连线距离
虽然在低维中使用两点的直线距离与测地线距离相等,然而高维中的测地线距离可能才是真正的距离,使用直线距离来度量显得不合理,因此使用近邻点距离来代替两点之间的直线距离能更好的完成距离的度量。找到所有点的k个近邻并计算出距离并把其他点距离设置为无穷大作为距离度量的方式,再使用最短路计算的方法(Dijkstra或Floyd),而后根据MDS完成得到低维的结果。对于新的样本,可以将高维的样本点作为输入,低维作为输出构造回归分类器。

10.5.2 局部线性嵌入(Locally Linear Embedding ,简称LLE)

核心思想:保持高低维中近邻变量的线性关系不变。

First, determine the number of neighbors of a point, and then calculate the linear approximation represented by the neighbor coefficient (to reconstruct the original gap is minimized as the target), a sample calculation of low-dimensional coordinates (in the low-dimensional reconstruction of the original gap minimized as a target) to give M = (IW) ᐪ (IW), the M eigendecomposed, select the smallest feature vector M d 'corresponding eigenvalue.

10.6 metric learning (metric learning)

The core idea: a direct measure of the distance dimension reduction methods to find the best.

Find a common representation of the distance, and one of the parameters (M) directly to learn, and M included in evaluation, completion.
Here Insert Picture Description

Neighbors component analysis (Neighbourhood Component Analysis, referred to as NCA)

K-nearest neighbor is similar, but the vote to modify the way: according to the distance, give different probabilities, to complete the final judgment, there are M learning objective function.

Released two original articles · won praise 0 · Views 30

Guess you like

Origin blog.csdn.net/weixin_43958728/article/details/104049658