Whitening (pretreatment step) [turn] Albino (pretreatment step) [turn]

Whitening (pretreatment step) [turn]

 

Introduction

We already know how to use PCA to reduce data dimension. In some algorithms we need a pretreatment steps associated with this process is called pre-whitening. For example, if the training data is an image, since the image between adjacent pixels have a strong correlation, the input used for training is redundant. The purpose is to reduce the whitening of the input redundancy; more formally, we hope that the whitening process input learning algorithm has the following properties: (i) low correlation characteristic; (ii) all features having the same variance .

Examples of 2D

Let us first albino main idea of ​​using 2D example of the foregoing description, and then introduced how to combine whitening and smooth and PCA.

How to eliminate the correlation between the characteristics? Text calculation front \textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}has actually been input feature eliminates \textstyle x^{(i)}the correlation between. The resulting novel features  \textstyle x_{\rm rot} distribution as shown below:

PCA-rotated.png

The data covariance matrix as follows:

\begin{align}
\begin{bmatrix}
7.29 & 0  \\
0 & 0.69
\end{bmatrix}.
\end{align}

(Note: the establishment Strictly speaking, this part of the many statements about "covariance" only when the data mean and 0:00 the following discussion are implicitly assuming that this condition is met, but even if the data mean is not 0, the following statement still valid, so you do not need to worry about this.)

 \textstyle x_{\rm rot} The diagonal elements of the covariance matrix of values  \textstyle \lambda_1 and  \textstyle \lambda_2 no accident. And the non-diagonal elements of the value 0; therefore,  \textstyle x_{{\rm rot},1} and  \textstyle x_{{\rm rot},2} are not related, that we satisfy the first requirement whitened result (characteristic correlation between reduced).

 In order for each input feature having a unit variance, we can directly use \textstyle 1/\sqrt{\lambda_i}as a scaling factor for each scaling feature \textstyle x_{{\rm rot},i}. In particular, our data whitened defined \textstyle x_{{\rm PCAwhite}} \in \Re^nas follows:
\begin{align}
x_{{\rm PCAwhite},i} = \frac{x_{{\rm rot},i} }{\sqrt{\lambda_i}}.   
\end{align}

 Draw  \textstyle x_{{\rm PCAwhite}} , we get:

 PCA-whitened.png

 These data now covariance matrix is the identity matrix I. We say \textstyle x_{{\rm PCAwhite}} that data through the PCA albino version:  \textstyle x_{{\rm PCAwhite}} no correlation between the different characteristics and have unit variance.

 白化与降维相结合:如果你想要得到经过白化后的数据,并且比初始输入维数更低,可以仅保留 \textstyle x_{{\rm PCAwhite}} 中前 \textstyle k 个成分。当我们把PCA和正则化结合起来时(在稍后讨论),\textstyle x_{{\rm PCAwhite}}中最后的少量成分将总是接近于0,因而舍弃这些成分不会带来很大的问题。

ZCA白化

 最后要说明的是,是数据的协方差矩阵变为单位矩阵I的方式并不唯一。具体地,如果R是任意正交矩阵,即满足\textstyle RR^T = R^TR = I(说它正交不太严格,R可以是旋转或反射矩阵),那么

 \textstyle R \,x_{\rm PCAwhite}仍然具有单位协方差。在ZCA白化中,令 \textstyle R = U 。我们定义ZCA白化的结果为:\begin{align}
x_{\rm ZCAwhite} = U x_{\rm PCAwhite}
\end{align}

 绘制\textstyle x_{\rm ZCAwhite},得到:

 ZCA-whitened.png

 可以证明,对所有可能的R,这种旋转式的\textstyle x_{\rm ZCAwhite}尽可能地接近原始输入数据x。当使用ZCA白化时(不同于PCA白化),我们通常保留数据的全部n个维度,不尝试去降低它的维数。

 正则化

实践中需要实现PCA白化或ZCA白化时,有时一些特征值\textstyle \lambda_i在数值上接近于0,这样在缩放步骤时我们除以 \sqrt{\lambda_i} 将导致除以一个接近0的值;这可能会导致数据上溢(赋为最大值)或造成数值不稳定。因而在实践中,我们使用少量的正则化实现这个缩放过程,即在取平方根和倒数之前给特征值加上一个很小的常数\textstyle \epsilon

\begin{align}
x_{{\rm PCAwhite},i} = \frac{x_{{\rm rot},i} }{\sqrt{\lambda_i + \epsilon}}.
\end{align}

当 \textstyle x 在区间 \textstyle [-1,1] 上时, 一般取值为 \textstyle \epsilon \approx 10^{-5}

对图像来说,这里加上\textstyle \epsilon,对输入图像也有一些平滑(或低通滤波)的作用。这样处理还能消除在图像的像素信息获取过程中产生的噪声,改善学习到的特征。

 ZCA白化是一种数据预处理方法,它将数据从 \textstyle x 映射到 \textstyle x_{\rm ZCAwhite} 。 

 事实证明这也是一种生物眼睛(视网膜)处理图像的粗糙模型。具体而言,当你的眼睛感知图像时,由于一幅图像中相邻的部分在亮度上十分相关,大多数临近的“像素”在眼中被感知为相近的值。因此,如果人眼需要分别传输每个像素值(通过视觉神经)到大脑中,会非常不划算。取而代之的是,视网膜进行一个与ZCA中相似的去相关操作(这是由视网膜上的ON-型和OFF-型光感受器细胞将光信号转变为神经信号完成的)。由此得到对输入图像的更低冗余的表示,并将它传输到大脑。

介绍

我们已经了解了如何使用PCA降低数据维度。在一些算法中还需要一个与之相关的预处理步骤,这个预处理过程称为白化。举例来说,假设训练数据是图像,由于图像中相邻像素之间具有很强的相关性,所以用于训练时输入是冗余的。白化的目的就是降低输入的冗余性;更正式的说,我们希望通过白化过程使得学习算法的输入具有如下性质:(i)特征之间相关性较低;(ii)所有特征具有相同的方差。

2D的例子

下面我们先用前文的2D例子描述白化的主要思想,然后分别介绍如何将白化与平滑和PCA相结合。

如何消除特征之间的相关性?在前文计算\textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}时实际上已经消除了输入特征\textstyle x^{(i)}之间的相关性。得到的新特征 \textstyle x_{\rm rot} 的分布如下图所示:

PCA-rotated.png

这个数据的协方差矩阵如下:

\begin{align}
\begin{bmatrix}
7.29 & 0  \\
0 & 0.69
\end{bmatrix}.
\end{align}

(注:严格地讲,这部分许多关于“协方差”的陈述仅当数据均值为0时成立。下文的论述都隐式地假定这一条件成立,不过即使数据均值不为0,下文的说法仍然成立,所以你无需担心这个。)

 \textstyle x_{\rm rot} 协方差矩阵对角元素的值为 \textstyle \lambda_1 和 \textstyle \lambda_2 绝非偶然。并且非对角元素值为0; 因此, \textstyle x_{{\rm rot},1} 和 \textstyle x_{{\rm rot},2} 是不相关的, 满足我们对白化结果的第一个要求 (特征间相关性降低)。

 为了使每个输入特征具有单位方差,我们可以直接使用\textstyle 1/\sqrt{\lambda_i}作为缩放因子来缩放每个特征\textstyle x_{{\rm rot},i}。具体地,我们定义白化后的数据\textstyle x_{{\rm PCAwhite}} \in \Re^n如下:
\begin{align}
x_{{\rm PCAwhite},i} = \frac{x_{{\rm rot},i} }{\sqrt{\lambda_i}}.   
\end{align}

 绘制出 \textstyle x_{{\rm PCAwhite}} ,我们得到:

 PCA-whitened.png

 这些数据现在的协方差矩阵为单位矩阵I。我们说,\textstyle x_{{\rm PCAwhite}} 是数据经过PCA白化后的版本: \textstyle x_{{\rm PCAwhite}} 中不同的特征之间不相关并且具有单位方差。

 白化与降维相结合:如果你想要得到经过白化后的数据,并且比初始输入维数更低,可以仅保留 \textstyle x_{{\rm PCAwhite}} 中前 \textstyle k 个成分。当我们把PCA和正则化结合起来时(在稍后讨论),\textstyle x_{{\rm PCAwhite}}中最后的少量成分将总是接近于0,因而舍弃这些成分不会带来很大的问题。

ZCA白化

 最后要说明的是,是数据的协方差矩阵变为单位矩阵I的方式并不唯一。具体地,如果R是任意正交矩阵,即满足\textstyle RR^T = R^TR = I(说它正交不太严格,R可以是旋转或反射矩阵),那么

 \textstyle R \,x_{\rm PCAwhite}仍然具有单位协方差。在ZCA白化中,令 \textstyle R = U 。我们定义ZCA白化的结果为:\begin{align}
x_{\rm ZCAwhite} = U x_{\rm PCAwhite}
\end{align}

 绘制\textstyle x_{\rm ZCAwhite},得到:

 ZCA-whitened.png

 可以证明,对所有可能的R,这种旋转式的\textstyle x_{\rm ZCAwhite}尽可能地接近原始输入数据x。当使用ZCA白化时(不同于PCA白化),我们通常保留数据的全部n个维度,不尝试去降低它的维数。

 正则化

When the need to implement in practice ZCA PCA whitening or bleaching, and sometimes some of the characteristic values \textstyle \lambda_iin value close to 0, so that when we divide scaling step  \sqrt{\lambda_i} will lead to dividing a value close to 0; this may lead to overflow of data (as assigned maximum) or cause numerical instabilities. In practice, therefore, we use a small amount of regularization achieve this scaling process, i.e. prior to taking the square root and reciprocal to the feature value plus a small constant \textstyle \epsilon:

\begin{align}
x_{{\rm PCAwhite},i} = \frac{x_{{\rm rot},i} }{\sqrt{\lambda_i + \epsilon}}.
\end{align}

When  \textstyle x the interval  \textstyle [-1,1] when the general value of  \textstyle \epsilon \approx 10^{-5}.

Image, here together with \textstyle \epsilonthe input image, there are some smoothing (or low-pass filtering) effect. Such processing can eliminate noise generated in the pixel information of the image acquisition process, to improve the learning feature.

 ZCA whitening is a method for pre-processing data, the data from which will  \textstyle x be mapped to  \textstyle x_{\rm ZCAwhite} . 

 Facts have proved that this is a rough model biological eye (retina) to process images. Specifically, when your eyes perceive images in an image due to the adjacent section is relevant in brightness, most near the "pixel" is perceived as a value close to the eyes. Thus, if the human eye to each pixel value needs to be transmitted (via the optic nerve) to the brain, respectively, it will be very uneconomical. Instead, a retina and ZCA similar decorrelation operations (which is a type ON- and OFF- type photoreceptor cells in the retina the light signals into nerve signals completion). Whereby less redundant representation of the input image, and transmits it to the brain.

Guess you like

Origin www.cnblogs.com/jfdwd/p/11240987.html