Dimensionality reduction, PCA, SVD

Dimensionality reduction

Machine learning computational algorithm is often accompanied dimension \ (d \) grew exponentially growth, such as VC-dimensional linear perceptron is \ (d + 1 \)

Removing unnecessary dimension retention features may be useful to improve the accuracy while reducing the calculation amount

\ (\ mathtt Z = {\ Phi (X)} \) , \ (\ mathtt} {X \) is the original input vector, \ (\ mathtt {Z} \) is the transformed input vector

If \ (\ mathtt {z} \ ) is less than the dimension (\ mathtt {x} \) \ dimension, then the dimension reduction realized

Ideal input vector dimension and the dimension should be consistent with the objective function of the input, which shows feature selection and identification of target function is generally difficult

In the non-linear conversion feature we added dimension to efforts to improve the in-sample error, at the cost of reduced generalization, on the contrary, it reduces the dimension in the case if we can not damage the error in the sample, then we can improve Generalization.

PCA (Principal Component Analysis)

PCA linear configuration wherein a small amount of input data to summarize, the idea is that the rotation axis (A linear conversion to define a new coordinate system), is important in the new coordinate system dimension will become apparent, it may be retained without significant dimensions can be discarded.

Data after rotation \ (z_1 \) axis is more important, \ (Z_2 \) axis becomes less important, it looks like a series of small fluctuations (noise), we can ignore it (setting it to zero), this realization of the dimensionality reduction

In the two-dimensional data, we can use a vector \ (\ mathtt {v} \ ) to capture the maximum fluctuation direction

Data points \ (\ mathtt \ {v} ) projected in this direction \ (Z_n \) may reflect a data point \ (\ mathtt {v} \ ) fluctuation direction, while the overall volatility can variance measure (assuming data has centered)

\ (Z_n = \ mathtt {v ^ s} \)

\ (\ mathtt {C} \ ) refers to the covariance matrix

\(\begin{equation} \begin{split} var[z]=\frac{1}{N}\sum_{n=1}^Nz_n^2&=\frac{1}{N}\mathtt{v^Tx_nx_n^Tv}\\ &=\mathtt{v^T}(\frac{1}{N}\mathtt{x_nx_n^T})\mathtt{v}\\ &=\mathtt{v^TCv} \end{split} \end{equation}\)

To maximize data points \ (\ mathtt {v} \ ) fluctuation direction, we should maximize the variance, we simply select \ (\ mathtt {v} \ ) eigenvector as the maximum eigenvalue of the covariance matrix.

After dimension reduction data

\(\mathtt{X'=Xv^T}\)

\ (\ mathtt {Ax of = \ the lambda X} \) , A is a square matrix, \ (\ mathtt {X} \) is a column vector, \ (\ the lambda \) is the number, if the former holds then called \ ( \ the lambda \) is a square matrix corresponding to the column vector \ (\ mathtt {x} \ ) eigenvalues, eigenvectors by solving \ (| \ lambda E- \ mathtt {a} | \) to obtain

Using linear algebra geometric sense, a vector matrix is to become another transformation vector, in \ (var [z] = \ mathtt {v TCv ^} \) in, \ (\ mathtt Cv} {\) geometry the meaning is the vector \ (\ mathtt {v} \ ) transform, \ (\ ^ V TCV mathtt {} \) geometrical meaning is then converted to vector projected onto the original vector ..

In order to maximize \ (var [Z] \) , the projection vector converted to the maximum, in the same apparent direction of projection of the two large vectors, the vectors only if the co-conversion, and amplification of a certain multiple of the better, it is entirely in line with the feature vector big eigenvalues of the concept of

Eigenvector square matrix of a transformation refers to the side effect in the eigenvector matrix, and does not rotate, scaling occurs only (i.e., the same direction), while the degree of reaction characteristic value just scaled absolute value greater than 1 is an enlarged, the absolute value of less than 1 is reduced from greater than 0, a negative sign is reversed, the positive sign is positive.

We select one larger eigenvalues k eigenvectors composition V, \ (\ mathtt {V} = \ bmatrix the begin {{} \ {V_1 mathtt}} & {\ mathtt V_2 {}} & {\ &} {cdots \ mathtt {v_k}} \ end {bmatrix}, \ mathtt {V} \ in R ^ {d * k} \)

\(\mathtt{X}\in R^{N*d}\)

After the final conversion of input data \ (\ {X-mathtt 'XV} = \) , this dimension will be dropped from the d k

If you do not understand the physical meaning of linear algebra please see the essence of linear algebra

Incidentally said Albino makes all directions are equal, so after doing albino PCA is of no use

coordinate system

A coordinate system defined by standard coordinate group (a series of mutually orthogonal unit vectors \ (\ {V_1 mathtt, ..., V_D} \) ) coordinate determination

NATURAL group \ (\ mathtt {u_1, ... , u_i} \) is the unit vector \ (\ mathtt {u_i} \ ) i-th element is 1, 0

NATURAL vector space group determined in the \ (\ mathtt {x} \ ) orthonormal basis \ (\ mathtt {v_1, ... , v_d} \) coordinate space is determined in the \ (\ mathtt {x '= \ sum_ {i = 1 } ^ dz_iv_i = \ sum_ {i = 1} ^ d (x ^ Tv_i) v_i} \)

\ (z_i \) described in \ (\ mathtt {x} \ ) in (\ mathtt {v_i} \) \ components on a substrate (projection)

\ (\ Mathtt {z} = \ begin {} bmatrix z_1 \\\ vdots z_k \\ \ end {bmatrix} = \ begin {bmatrix} \ mathtt {x} ^ Tv_i \\\ vdots \\\ mathtt {x ^ Tv_k} \ end {bmatrix} = \ Phi (x) \)

If we orthonormal basis \ (\ mathtt {v_1, ... , v_d} \) consistent with the number of input features, we can reconstruct \ (\ mathtt {x} \ ) by

\(\mathtt{x=\sum_{i=1}^dz_iv_i}\)

If keeping only the first k parts

\(\mathtt{\hat{x}=\sum_{i=1}^dz_iv_i}\)

We lost some of the components to achieve a dimensionality reduction

After the loss of a portion of the reconstructed data point can be measured by the following error

\(||\mathtt{x-\hat{x}}||^2=||\sum_{i=k+1}^dz_i\mathtt{v}_i||^2=\sum_{i=k+1}^dz_i^2\)

Because of \ (\ mathtt {v_i} \ ) is a unit vector of length 1

After all of the missing portion of the reconstructed data points can be measured by the following error

\(\sum_{n=1}^N||\mathtt{x-\hat{x}}||^2\)

Principal component analysis (PCA) to find a coordinate system, so that the overall reconstruction error is minimized. Trailing dimension will contain as little information, even trailing discard these dimensions, we still can almost reconstruct the original data. PCA is optimal, which means that no other linear methods can generate a smaller reconstruction error coordinates. The optimum coordinate groups first k basis vectors, called the top-k main direction.

The SVD (Singular Value Decomposition)

Singular value decomposition is obtained by the right singular vectors, as \ (\ {V_1 mathtt, ..., V_D} \) (optimal coordinate group), then ignore trailing components to reduce the reconstruction error

Any one of the Matrix \ (\ mathtt X \) can be decomposed into the following form

\(\mathtt{X=U\Gamma V^T}\)

\ (\ mathtt {U} \ ) having orthonormal columns ( \ (\ ^ the U-mathtt the TU = {E} \) )

\ (\ mathtt {V} \ ) is an orthogonal matrix ( \ (\ mathtt the TV ^ V {E} = \) )

\ (\ The Gamma \) is a diagonal matrix, the elements on which the \ (\ gamma_i = \ Gamma_ {II} \) , \ (\ gamma_1 \ GE \ gamma_2 \ GE .... \ GE \ gamma_d \ GE0 \ ) , \ (\ gamma_i \) called \ (\ mathtt {X} \ ) singular values

\ (\ mathtt {U} \ ) contained in a column referred to as \ (\ mathtt {X} \ ) left singular vectors, the value of \ (\ mathtt {V} \ ) contained in a column referred to as \ (\ mathtt {X } \) right singular vectors

The matrix is ​​mapped to a feature vector is a multiple of its own, a more general non-square a left (right) singular vectors are mapped to the corresponding right (right) is a factor. (Left) singular vectors, verified by the following identity:

\(\mathtt{U^TX=\Gamma V^T},\mathtt{XV=U\Gamma}\)

algorithm

Input data has centering \ (\ mathtt {X} \ )

  1. The \ (\ mathtt {X} \ ) a singular value decomposition to calculate the right singular matrix \ (\ mathtt {V} \ )
  2. Take \ (\ mathtt {V} \ ) first k columns \ (\ mathtt {V_k} \ )
  3. \ (\ mathtt the Z = XV_k} {\) , the input data reconstructed \ (\ mathtt {\ hat { X} = XV_kV_k ^ T} \)

Guess you like

Origin www.cnblogs.com/redo19990701/p/11415107.html