Easy-to-understand machine learning - derivation and explanation of the mathematical principles of gradient ascent principal component analysis

This article has participated in the "New Talent Creation Ceremony" event

foreword

Gradient ascent principal component analysis is also a commonly used operation for dimensionality reduction and noise reduction. It transforms the coordinate axis (actually the mapping of the data) to maximize the direct variance of the data, that is, it is easier to divide

Analysis of purpose and principle

The principal component feature vector is obtained. In the gradient ascent principal component analysis, the original data can be mapped to the coordinate axis with greater variance (some books or videos also say moving the coordinate axis), and the mapping method can be multiplied by a The unit vector of the mapping direction.

preprocessing

In order to facilitate the operation, the data must be preprocessed first. The preprocessing here is mean zeroing, that is, subtracting the mean from all the data. x i = x i x ˉ x_i=x_i- \bar{x}

Formula derivation and explanation

Target

What we want to get is the data with the largest variance, namely: m a x 1 n i = 1 n ( x p i x ˉ ) 2 max \frac{1}{n} \sum_{i=1}^n(x^i_p-\bar{x})^2 Because we have preprocessed the data. So our goal becomes: m a x 1 n i = 1 n ( x p i ) 2 max \frac{1}{n} \sum_{i=1}^n(x^i_p)^2

Mapping relationship derivation

Suppose xi is to be mapped to the new axis x i w = x i w c o s θ x^iw= \rvert x^i \lvert \cdot \rvert w \lvert cos\theta 在这里我们假设向量w为单位向量 则使之变为: x i w = x i c o s θ = x p i x p i 为变换后的目标数据) x^iw= \rvert x^i \lvert cos\theta=x^i_p(x^i_p为变换后的目标数据)

将映射关系带入

带入映射关系后我们的目标变为了 m a x 1 n i = 1 n ( x i w ) 2 max \frac{1}{n} \sum_{i=1}^n(x^iw)^2

梯度上升

构建一个函数

现在我们构建一个函数

= 1 n i = 1 n ( x 1 i w + x 2 i w + x 3 i w + x n i w ) 2 \\= \frac{1}{n} \sum_{i=1}^{n}(x^i_1w+x^i_2w+x^i_3w+ \cdot \cdot \cdot x^i_nw)^2

求导

下面我们开始对f(w)求导 f ( w ) = 1 n x T ( x w ) \nabla f(w)= \frac{1}{n}x^T(xw) 我们的问题就变成了使用上述式子进行梯度上升,接下来我们使用梯度上升的方式优化w,优化得到的w即为可以将数据转换到方差更大的坐标轴上的主成分

Guess you like

Origin juejin.im/post/7078083790697922567