Whitening (pretreatment step) [turn]
Introduction
We already know how to use PCA to reduce data dimension. In some algorithms we need a pretreatment steps associated with this process is called pre-whitening. For example, if the training data is an image, since the image between adjacent pixels have a strong correlation, the input used for training is redundant. The purpose is to reduce the whitening of the input redundancy; more formally, we hope that the whitening process input learning algorithm has the following properties: (i) low correlation characteristic; (ii) all features having the same variance .
Examples of 2D
Let us first albino main idea of using 2D example of the foregoing description, and then introduced how to combine whitening and smooth and PCA.
How to eliminate the correlation between the characteristics? Text calculation front has actually been input feature eliminates the correlation between. The resulting novel features distribution as shown below:
The data covariance matrix as follows:
(Note: the establishment Strictly speaking, this part of the many statements about "covariance" only when the data mean and 0:00 the following discussion are implicitly assuming that this condition is met, but even if the data mean is not 0, the following statement still valid, so you do not need to worry about this.)
The diagonal elements of the covariance matrix of values and no accident. And the non-diagonal elements of the value 0; therefore, and are not related, that we satisfy the first requirement whitened result (characteristic correlation between reduced).
In order for each input feature having a unit variance, we can directly use as a scaling factor for each scaling feature . In particular, our data whitened defined as follows:
Draw , we get:
These data now covariance matrix is the identity matrix I. We say that data through the PCA albino version: no correlation between the different characteristics and have unit variance.
白化与降维相结合:如果你想要得到经过白化后的数据,并且比初始输入维数更低,可以仅保留 中前 个成分。当我们把PCA和正则化结合起来时(在稍后讨论),中最后的少量成分将总是接近于0,因而舍弃这些成分不会带来很大的问题。
ZCA白化
最后要说明的是,是数据的协方差矩阵变为单位矩阵I的方式并不唯一。具体地,如果R是任意正交矩阵,即满足(说它正交不太严格,R可以是旋转或反射矩阵),那么
仍然具有单位协方差。在ZCA白化中,令 。我们定义ZCA白化的结果为:
绘制,得到:
可以证明,对所有可能的R,这种旋转式的尽可能地接近原始输入数据x。当使用ZCA白化时(不同于PCA白化),我们通常保留数据的全部n个维度,不尝试去降低它的维数。
正则化
实践中需要实现PCA白化或ZCA白化时,有时一些特征值在数值上接近于0,这样在缩放步骤时我们除以 将导致除以一个接近0的值;这可能会导致数据上溢(赋为最大值)或造成数值不稳定。因而在实践中,我们使用少量的正则化实现这个缩放过程,即在取平方根和倒数之前给特征值加上一个很小的常数:
当 在区间 上时, 一般取值为 。
对图像来说,这里加上,对输入图像也有一些平滑(或低通滤波)的作用。这样处理还能消除在图像的像素信息获取过程中产生的噪声,改善学习到的特征。
ZCA白化是一种数据预处理方法,它将数据从 映射到 。
事实证明这也是一种生物眼睛(视网膜)处理图像的粗糙模型。具体而言,当你的眼睛感知图像时,由于一幅图像中相邻的部分在亮度上十分相关,大多数临近的“像素”在眼中被感知为相近的值。因此,如果人眼需要分别传输每个像素值(通过视觉神经)到大脑中,会非常不划算。取而代之的是,视网膜进行一个与ZCA中相似的去相关操作(这是由视网膜上的ON-型和OFF-型光感受器细胞将光信号转变为神经信号完成的)。由此得到对输入图像的更低冗余的表示,并将它传输到大脑。
介绍
我们已经了解了如何使用PCA降低数据维度。在一些算法中还需要一个与之相关的预处理步骤,这个预处理过程称为白化。举例来说,假设训练数据是图像,由于图像中相邻像素之间具有很强的相关性,所以用于训练时输入是冗余的。白化的目的就是降低输入的冗余性;更正式的说,我们希望通过白化过程使得学习算法的输入具有如下性质:(i)特征之间相关性较低;(ii)所有特征具有相同的方差。
2D的例子
下面我们先用前文的2D例子描述白化的主要思想,然后分别介绍如何将白化与平滑和PCA相结合。
如何消除特征之间的相关性?在前文计算时实际上已经消除了输入特征之间的相关性。得到的新特征 的分布如下图所示:
这个数据的协方差矩阵如下:
(注:严格地讲,这部分许多关于“协方差”的陈述仅当数据均值为0时成立。下文的论述都隐式地假定这一条件成立,不过即使数据均值不为0,下文的说法仍然成立,所以你无需担心这个。)
协方差矩阵对角元素的值为 和 绝非偶然。并且非对角元素值为0; 因此, 和 是不相关的, 满足我们对白化结果的第一个要求 (特征间相关性降低)。
为了使每个输入特征具有单位方差,我们可以直接使用作为缩放因子来缩放每个特征。具体地,我们定义白化后的数据如下:
绘制出 ,我们得到:
这些数据现在的协方差矩阵为单位矩阵I。我们说, 是数据经过PCA白化后的版本: 中不同的特征之间不相关并且具有单位方差。
白化与降维相结合:如果你想要得到经过白化后的数据,并且比初始输入维数更低,可以仅保留 中前 个成分。当我们把PCA和正则化结合起来时(在稍后讨论),中最后的少量成分将总是接近于0,因而舍弃这些成分不会带来很大的问题。
ZCA白化
最后要说明的是,是数据的协方差矩阵变为单位矩阵I的方式并不唯一。具体地,如果R是任意正交矩阵,即满足(说它正交不太严格,R可以是旋转或反射矩阵),那么
仍然具有单位协方差。在ZCA白化中,令 。我们定义ZCA白化的结果为:
绘制,得到:
可以证明,对所有可能的R,这种旋转式的尽可能地接近原始输入数据x。当使用ZCA白化时(不同于PCA白化),我们通常保留数据的全部n个维度,不尝试去降低它的维数。
正则化
When the need to implement in practice ZCA PCA whitening or bleaching, and sometimes some of the characteristic values in value close to 0, so that when we divide scaling step will lead to dividing a value close to 0; this may lead to overflow of data (as assigned maximum) or cause numerical instabilities. In practice, therefore, we use a small amount of regularization achieve this scaling process, i.e. prior to taking the square root and reciprocal to the feature value plus a small constant :
When the interval when the general value of .
Image, here together with the input image, there are some smoothing (or low-pass filtering) effect. Such processing can eliminate noise generated in the pixel information of the image acquisition process, to improve the learning feature.
ZCA whitening is a method for pre-processing data, the data from which will be mapped to .
Facts have proved that this is a rough model biological eye (retina) to process images. Specifically, when your eyes perceive images in an image due to the adjacent section is relevant in brightness, most near the "pixel" is perceived as a value close to the eyes. Thus, if the human eye to each pixel value needs to be transmitted (via the optic nerve) to the brain, respectively, it will be very uneconomical. Instead, a retina and ZCA similar decorrelation operations (which is a type ON- and OFF- type photoreceptor cells in the retina the light signals into nerve signals completion). Whereby less redundant representation of the input image, and transmits it to the brain.