PCA Principal Component Analysis (Dimensionality Reduction) Process Derivation

The role of principal component analysis is dimensionality reduction. When the amount of data has multiple dimensions, some dimensions contribute greatly to the data, while others contribute little to the data. Through principal component analysis, finding important dimensions can greatly reduce the amount of calculation.

The central idea of ​​PCA:

A Center: Reconstruction of the original feature space.

Two basic points: maximum projection variance, minimum reconstruction distance.

---------------------------------------------------------------------------------------------------------------------------------

The minimum reconstruction distance is constructed by the following formula.

Before reconstruction : (xn is each sample in decentralization)

\mathbf{x}_{n}=\sum_{d=1}^{D} \alpha_{nd} \mathbf{u}_{d}=\sum_{m=1}^{M} \alpha_{ nm} \mathbf{u}_{m}+\sum_{i=M+1}^{D} \alpha_{ni} \mathbf{u}_{i}

\mathbf{x}_{n}Represents the original point, which can be expressed as the sum of d vectors (d dimensions). Through decomposition, it can be decomposed into two sets of vectors, PCA retains a part, discards a part, discards \mathbf{u}_{i}this part, and retains \mathbf{u}_{m}this part. a is the length of each decomposed vector u, multiplied and summed to reconstruct the original sample.

The effect of a point projected onto u_m, and the effect of all points projected onto the purple line (two-dimensional converted to one-dimensional)

 

 

 

After refactoring :

\tilde{\mathbf{x}}_{n}=\sum_{m=1}^{M} \alpha_{nm}\mathbf{u}_{m}

The cost of refactoring is to minimize the distance before and after refactoring: (after subtracting the two formulas, the latter part is left)

\begin{aligned}J=\fraction{1}{N}\sum_{n=1}^{N}\left\|\mathbf{x}_{n}-\adjustment{\mathbf{x}}_ {n}\right\|^{2}=& \frac{1}{N}\sum_{n=1}^{N}\left\|\sum_{i=M+1}^{D}\ alpha_{ni} \mathbf{u}_{i}\right\|^{2}=\frac{1}{N}\sum_{n=1}^{N}\sum_{i=M+1} ^{D} \alpha_{ni}^{2}=\sum_{i=M+1}^{D} \mathbf{\frac{1}{N}({u}_{i}^{T} x_{n})(x_{n}^{T}}\mathbf{u}_{i})=\sum_{i=M+1}^{D}\mathbf{u}_{i}^{ T} \mathbf{S} \mathbf{u}_{i} \\ & \text {where} \mathbf{S}=\frac{1}{N} \sum_{n=1}^{N}\ mathbf{x}_{n} \mathbf{x}_{n}^{T}\quad\text { (covariance matrix)} \end{aligned}

Here S is the covariance matrix.

Then the loss function is:

J=\sum_{i=M+1}^{D} \mathbf{u}_{i}^{T} \mathbf{S u}_{i}

Using Lagrange multiplier constrained optimization, the formula becomes:

(1) After Lagrangian optimization

J=\sum_{i=M+1}^{D} \mathbf{u}_{i}^{T} \mathbf{S u}_{i}+\lambda\left(1-\mathbf{u}_{i}^{T} \mathbf{u}_{i}\right)

(2) Derivation

\frac{\partial}{\partial \mathbf{u}_{i}} J=2\left(\mathbf{S} \mathbf{u}_{i}-\lambda_{i} \mathbf{u}_{i}\right)=0

but:

(3) Results:

\mathbf{S} \mathbf{u}_{i}=\lambda_{i} \mathbf{u}_{i}

u_{i} Denotes the eigenvector of S, \lambda_{i}denoting the eigenvalue .

---------------------------------------------------------------------------------------------------------------------------------

Then the steps of PCA are:

1. Averaging, Decentralization

\overline{\mathbf{x}}=\frac{1}{N}\sum_{n=1}^{N}\mathbf{x}_{n}

\mathbf{x}_{n}^{\text {norm}}=\mathbf{x}_{n}-\overline{\mathbf{x}}, \quad \forall n\in\{1, \ ldots, N\}

2. Calculate the covariance matrix

\mathbf{S}=\frac{1}{N} \sum_{n=1}^{N} \mathbf{x}_{n}^{\text {norm}}\left[\mathbf{x} _{n}^{\text {norm}}\right]^{T}

3. Eigen decomposition

\mathbf{S}=\mathbf{U} \boldsymbol{\Lambda} \mathbf{U}^{-1}

The process of matrix decomposition looks like this

\mathbf{S}=\left[\begin{array}{ccc}\sigma_{11}^{2} & \ldots & \sigma_{1 M}^{2}\\\vdots&\ddots&\vdots \\\sigma_{M1}^{2}&\cdots&\sigma_{MM}^{2}\end{array}\right]=\mathbf{U}\ball symbol{\Lambda}\mathbf{U} ^{T}=\left[\begin{array}{lll}\mathbf{U}_{s}&\mid&\mathbf{U}_{\mathbf{n}}\end{array}\right] \left[\begin{array}{c|c}\ball symbol{\Lambda}_{s} & \mathbf{0} \\ \hline \mathbf{0} & \ball symbol{\Lambda}_{n}\ end{array}\right]\left[\begin{array}{c}\mathbf{U}_{s}^{T}\\\hline\mathbf{U}_{\mathbf{n}}{ } ^{T}\end{array}\right]

\lambda_{d}4. Sort the columns of U by the eigenvalues

5. Select M eigenvectors to form\tilde{\mathbf{U}}

6. Make a projection

\tilde{\mathbf{x}}_{n}=\tilde{\mathbf{U}} \tilde{\mathbf{U}}^{T} \mathbf{x}_{n}^{\text { norm }}+\overline{\mathbf{x}}

J=\sum_{i=M+1}^{D} \lambda_{i}

Guess you like

Origin blog.csdn.net/qq_39696563/article/details/122076897