A pure L1-norm principal component analysis

@

A pure L1-norm principal component analysis

Although not fully understood mathematical meaning of them, but find it interesting, record it.

problem

As we all know, the general PCA (with paper \ (L_2-PCA \) represents) the use of two norm, loss of function and structure solved, but there is a problem that would be very sensitive to outliers. Therefore, there have been many of PCA began to \ (\ ell_1 \) rely on the norm, but a little different and I know this paper.

Like the thing I read SPCA Zou 06 years of:
Here Insert Picture Description
notice, \ (\ ell_1 \) role in \ (\ beta \) on a way to get the thinning.

The paper seems somewhat different, from the viewpoint of the return, the return of the general problem is to minimize the following loss function:
\ [\ sum_. 1} ^ {n-I = (y_i - (\ + beta_0 \ mathbf {\ Beta} ^ Tx_I .)) ^ 2 \]
in order to reduce the influence of outliers, use:
\ [\ sum_. 1} ^ {n-I = | y_i - (\ + beta_0 \ mathbf {\} ^ Tx_I Beta) | \].
and the authors note that the above problems can be solved using linear programming:
Here Insert Picture Description
back to the PCA, we hope to find a direction, a sample point to this direction \ (\ ell_1 \) distance and the shortest (probably misunderstood a).

detail

\ (L_1-PCA \) loss function

First, the data is assumed that the input \ (x_i \ in \ mathbb {R & lt} ^ m \) , and constitutes a data matrix \ (X-\ in \ mathbb {R & lt} ^ {n-\ Times m} \) . First, the author wanted to find a \ (m-1 \) subspace dimension, and this subspace sample points \ (\ ell_1 \) distance and the shortest. prior to this, it is necessary to discuss the calculation of the distance.

Here Insert Picture Description
Can be seen from the figure, a point to a hyperplane \ (S \) is \ (\ ell_1 \) distance is not as common as the Euclidean distance, in fact, so to define the distance between the point subspace:
\ [d (x, S) =
\ inf \ {\ | xz \ || \ forall z \ in S \} \.] is assumed by the hyperplane S \ (\ beta ^ T x = 0 \) characterization (assuming it passes origin), then:
first, for a sample point \ (x_i \) , select a \ (J \) , so \ (y_i = z_i, I = \ Not J \) , and \ (y_j \) is defined as (assuming \ (\ beta_j = \ Not 0 \) ):
\ [- \ FRAC {\ sum_ {I = \ Not J} \ beta_i x_i} {\ beta_j} \]
so easy to prove \ (\ beta ^ T y = 0 \ ) , that is, \ (the y-\ in S \) .

下面证明, 如果这个\(j\)使得\(|\beta_j| \ge |\beta_i|, \forall i = \not j\), 那么\(|x-y|\)就是\(x\)\(\ell_1\)距离. 首先证明,在只改变一个坐标的情况下是最小的, 此时:
\[ |x-y| = |x_j+\frac{\sum_{i = \not j} \beta_i x_i}{\beta_j}|=|\frac{\sum_{i } \beta_i x_i}{\beta_j}|=\frac{|\beta^Tx|}{|\beta_j|}. \]
因为分子是固定的,所以分母越大的距离越短,所以在只改变一个坐标的情况下是如此,下面再利用数学归纳法证明,如果距离最短,那么必须至多只有一个坐标被改变.
\(m=2\)的时候容易证明,假设\(m=k-1\)的时候已经成立,证明\(m=k\)也成立:
如果\(x, y\)已经存在一个坐标相同,那么根据前面的假设可以推得\(m=k\)成立,所以\(x, y\)必须每个坐标都完全不同. 不失一般性,选取\(\beta_1, \beta_2\),且假设均不为0, 且\(|\beta_1| \le |\beta_2|\).
\(y'_1=x_1, y'_2=y_2-\frac{\beta_1(x_1-y_1)}{\beta_2}\),其余部分于\(y\)保持相同.则距离产生变化的部分为:
\[ |x_1-y_1'|+|x_2-y_2'|=|y_2-x_2 - \frac{\beta_1(x_1-y_1)}{\beta_2}|\le |y_2-x_2|+|x_1-y_1| \]
所以,新的\(y'\)有一个坐标相同,而且距离更短了,所以\(m=k\)也成立.

所以,我们的工作只需要找到最大\(|\beta_j|\)所对应的\(j\)即可.

所以,我们的损失函数为:
\[ \sum_i \frac{|\beta^T x_i|}{|\beta_j|}. \]
因为比例的关系,我们可以让\(\beta_j=-1\)而结果不变:
\[ \sum_i |x_{ij}-\sum_{k = \not j}\beta_kx_{ik}|. \]
\(x_{ij}\)看成是\(y\),那么上面就变成了一个\(\ell_1\)回归问题了. 当然我们并不知道\(j\),所以需要进行\(m\)次运算,来找到\(j^*\)使得损失函数最小. 这样,我们就找到了一个\(m-1\)维的子空间.

算法如下:
Here Insert Picture Description

\(L_1-PCA\)算法

Here Insert Picture Description

因为PCA的目的是寻找一个方向,而不是一个子空间,所以需要不断重复寻找子空间的操作,这个地方我没怎么弄懂,不知是否是这样:

  1. 找到了一个子空间
  2. 将数据点投影到子空间上
  3. 寻找新的坐标系,则数据会从\(k\)-->\(k-1\)
  4. 在新的数据中重复上面的操作直至\(k=1\).

有几个问题:

投影

对应算法的第4步,其中
Here Insert Picture Description
需要一提的是,这里应该是作者的笔误,应当为:
\[ (I_{j^* \ell}^{j^*})^m = \beta_{\ell}^m, \ell = \not j^*, \]

理由有二:

首先,投影,那么至少要满足投影后的应当在子空间中才行,以3维样本为例:\(x=(x_1, x_2, x_3)^T, j=2\),
按照修改后的为:
\[ z = (x_1, \beta_1x_1+\beta_3 x_3, x_3) \]
于是\(\beta^Tz=0\), 而按照原先则不成立,
其次,再后续作者给出的例子中也可以发现,作者实际上也是按照修改后的公式进行计算的.

另外,提出一点对于这个投影方式的质疑. 因为找不到其理论部分,所以猜想作者是想按照\(\ell_1\)的方式进行投影,但是正如之前讲的,\(\ell_1\)的最短距离的投影是要选择\(|\beta_j|\)最大的\(j\),而之前选择的\(j^*\)并不能保证这一点.

坐标系

论文中也有这么一段话.

Here Insert Picture Description

既然\(\ell_1\)范数不具备旋转不变性,那么如何保证这种坐标系的选择是合适的呢,还有,这似乎也说明,我们最后选出来的方向应该不是全局最优的吧.

载荷向量

\ (\ alpha ^ k \) is the k th load vector space, therefore, it is a very different and SPCA is that it is not sparse.
Further, it has a nature, and by the \ (V ^ k \ ) Zhang subspace orthogonal, this is well documented, because \ (the Z-^ k \ Beta = 0 \) .

Overall, I think this idea is very interesting, but always felt a little lack of a reasonable explanation, feeling taken for granted ...

Guess you like

Origin www.cnblogs.com/MTandHJ/p/11440196.html