Connections and differences between the PCA, AE, VAE, RPCA, the probability of PCA

Here Insert Picture Description

AE from the encoder

Code: Tutorial from the encoder keras

Semi-supervised, from the encoder is only one layer of hidden nodes, having the same input and output nodes of the neural network

Automatic encoder input is X, AE our network structure X-> H-> X ', our goal is to X' as equal to X (X 'and X have the same dimensions), so trained H It can be used to represent or reconstruct X.
For compressing data, but no generalization, so as not to generate a model

Compare with automatic coding of PCA
  1) It is a similar PCA unsupervised machine learning algorithm. In general, AutoEncoder can be seen as an enhanced version of the nonlinear patch PCA, PCA results obtained are built on the basis of dimensionality reduction.
  2) it is to minimize the PCA and the same objective function. Autoencoder goal is to learn a function h (x) ≈x. In other words, it is to learn an approximate identity function, so that the output is approximately equal to the input x ^ x.
  3) It is a neural network, the target output of this neural network is input. Autoencoder family belongs to the neural network, but they also and PCA (Principal Component Analysis) is closely related.
  In short, although the automatic encoder PCA very similar, but much more flexible than automatic encoder PCA. In the encoding process, an automatic encoder to characterize both the linear transformation can also be characterized by non-linear transformation ; and PCA linear transformation is performed only . PCA can directly obtain the optimal analytical solution, but only AutoEncoders local optimal numerical solution obtained by backpropagation . Characterization of the network in the form of an automatic encoder, it is possible to build it as a layer for deep learning network. Suitable dimensions and sparse set constraints can be learned from the encoder to the projection data more interesting than PCA techniques and the like.
 From the encoder consists of two parts:
  1) Encoder: This part can be compressed into the input potential space characterized by encoding function may be h = f (x) represented by.
  2) Decoder: this portion can be reconstructed from the input potential space representation, the decoding can be represented by the function r = g (h)Here Insert Picture Description

PCA

PCA (Principal Component Analysis), principal component analysis, is the most widely used method of dimensionality reduction. PCA main idea is to map the n-dimensional feature to the k-dimensional, which is a new k-dimensional orthogonal feature is also known as the main ingredient, is re-constructed on the basis of the original n-dimensional feature of the k-dimensional feature. The new axes obtained in this manner, we found that most of the variance in the front are contained in the k axis, the rear axis of the variance contained almost zero. Thus, we can ignore the rest of the axes, leaving only the front of the k axis contains most of the variance. In fact, this is equivalent to retain only dimensional features include most of the variance, variance and ignore contains almost as feature dimensions 0, realize dimension reduction processing on the data characteristics. By the covariance matrix calculation data matrix , and then obtain covariance difference matrix eigenvalues eigenvector , eigenvalue matrix to select the maximum (i.e., the maximum variance) of the k th feature vectors corresponding to features thereof. This can be converted to the new data matrix space among the feature data dimension reduction realized.

** PCA two algorithms implemented method
based on eigenvalue decomposition of the covariance matrix to achieve PCA algorithm
Input: data set and requires reduced k-dimensional.

  1. To the average value (i.e., to the center), i.e., subtracting each average value of each feature.
  2. Computing the covariance matrix, Note: this addition or not in addition to the number of samples n or n-1, in fact, has no effect on the obtained feature vector.
  3. Covariance eigenvalue decomposition method eigenvalues ​​and eigenvectors.
  4. Characteristic values ​​sorted in descending order, selecting the k largest. Then the k eigenvectors corresponding to each row vector containing a matrix of eigenvectors P.
  5. Converting the data into a new space constructed k eigenvectors, i.e. Y = PX.

Based on the SVD of the covariance matrix to achieve PCA algorithm
input: data collection, you need to drop a k-dimensional.

  1. To the average, that each and every feature subtracting their average.
  2. Calculation of the covariance matrix.
  3. SVD of the covariance matrix calculated by the eigenvalue and eigenvector.
  4. Characteristic values ​​sorted in descending order, selecting the k largest. Then eigenvectors corresponding to the k column vectors each composed of a matrix of eigenvectors.
  5. Converting the data into a new space k eigenvectors are constructed.

Done by SVD, this method is very effective when a large sample size.

Probability PCA

PCA PCA is not a probability that a variant, it is the own PCA, PCA is the probability derived from the PCA and understanding another perspective, it is incorporated into the formula PCA framework.
In the PCA probabilities, f (z; \ theta) is linear, so we get a linear Gaussian model, the outstanding properties is a linear Gaussian model involves four probability distributions are Gaussian, so we can directly give an edge the analytical form of distribution and distribution of coding, maximum likelihood estimation and the EM algorithm can be used.
PCA and PCA linear function of the probability of x is z, only the probability of the PCA explicitly Gaussian noise \ epsilon written in the expression; PCA is not explicitly written out noise, but implicit in the Gaussian noise for two norm of the reconstruction error.

OH

The most important difference between the AE and VAE characterized VAE force hidden variable z satisfy the Gaussian distribution P (z) = N (z | 0, the I) , satisfying p (z) = N | when (z 0, I), p (z | x) the posterior probability of a normal distribution, and the distribution of z AE did not make any assumptions. This difference in generating such new sample, AE values need to fit p (z), to generate consistent with hidden variables distributed set of data

RPCA

Robust PCA contemplated that such a problem: a general data matrix D contains configuration information, also contains noise. Then this matrix can be decomposed into two matrices addition: D = A + E, A rank is low (due to internal causes a certain configuration information between each row or column are linearly related), E is sparse (containing noise , is sparse)

Robust PCA is decomposed into a matrix L and a possible sparse matrix S as a low rank matrix,
FIG PCP problem following formula, is converted by the formula over the above problems RPCA

Here Insert Picture Description

First elaborated and sparse low rank differences and connections

  • Sparse and low rank similarities that suggest redundancy matrix is ​​relatively large. Specifically, the sparse means there are a lot of zeros, which can be compressed; low-rank matrix means that there are many rows (columns) are linearly dependent.
  • Rank may be understood as the richness of information contained in the image, the data representing the lower rank greater redundancy, because a very few groups can express all the data. In contrast, the larger the rank data redundancy smaller

Here Insert Picture Description
Here Insert Picture Description

Like the classic PCA, Robust PCA (Principal Component Analysis robustness) on the nature of the problem is finding the best projection data on low-dimensional space. When the observation data and data containing a large noise , the PCA can not give the desired results, Robust PCA can be from the observations a large sparse noise pollution and restore the data in a low rank in nature.

RPCA and PCA difference
Robust PCA and PCA classic issues like the nature Robust PCA projection problem is finding the best data on low-dimensional space. For low-rank matrix of the observed data D, D affected if a random (sparse) noise, the low rank of D will destroy the D becomes full rank. Therefore, it is necessary to contain D decomposed into a low-rank matrix A and the actual structure of the sparse matrix E and noise. Found a low rank matrix, in fact, found a low-dimensional nature of spatial data, then Robust PCA's Robust Where is it? Because noise PCA assumption of a Gaussian data, for the loud noise or severe outliers, PCA will be its impact, cause it not to work properly. And Robust PCA is not this assumption (Robust PCA assuming the noise is sparse, regardless of the strength of noise) exists.

RPCA contrast optimization algorithm

See Rubost PCA optimization mathematical derivation summary

RPCA Code See this blog

RPCA Code See this blog

RPCA various algorithms for solving principles detailed in this article The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

1, the general algorithm augmented Lagrangian multiplier (Augmented Lagrangian Multipliers)
Here Insert Picture Description
2, Alternating Direction (Alternating the Direction Methods)
Here Insert Picture Description
the ADM to improve the ALM, speed of convergence, also known as Lagrange multiplier imprecise law.

3, Exact ALM algorithm
Here Insert Picture Description

4, the iteration threshold method (Iterative Thresholding)
Here Insert Picture Description

5, Inexact ALM algorithm Here Insert Picture Description
6, the acceleration gradient approximation method (Accelerated Proximal Gradient)
Here Insert Picture Description

PCP is a convex optimization problem solvable problem, you can use ALM (augmented Lagrangian algorithm to solve), PCA algorithm robust than the existing, such as acceleration gradient approximation (APG) algorithm good effect

Iterative algorithm is simple and forms of IT convergence, but the convergence is slow, and it is difficult to select the appropriate step; similar to APG and IT algorithm, but it greatly reduces the number of iterations; ALM much faster than the APG, and can achieve better ALM high precision, requires less storage space. Imprecise Lagrange multipliers (IALM) improved EALM, it does not require solving exact solution, faster.

RPCA respective comparison algorithm below

Here Insert Picture Description

Guess you like

Origin blog.csdn.net/qq_39751437/article/details/91491514