Principal component analysis
- Linear, unsupervised, global dimensionality reduction algorithm
PCA theoretical maximum variance
Starting point: In the field of signal processing, a signal having a large variance, the noise has a smaller variance
Goal: to maximize the projected variance, so that the maximum variance of the data in the main projection direction
PCA solution method:
The sample data processing center
Seeking sample covariance matrix
Covariance matrix eigenvalue decomposition, the eigenvalues in descending order
D corresponding to large eigenvalues vector before taking \ (W_1, w_2, \ cdots, w_d \) , by converting the n-dimensional samples are mapped to the d-dimensional
\[x^{'}_i = \begin{bmatrix} w_1^{T}x_i \\ w_2^Tx_i \\ \cdots \\ w_d^Tx_i \end{bmatrix}\]
The new \ (x ^ { '} _ i \) of dimension d is \ (x_i \) in the d-th principal component \ (w_d \) projected in a direction
limitation:
- Linear dimension reduction
- Kernel Principal Component obtained by nuclear PCA extended mapping analysis (on KPCA)
PCA least square error theory
Departure goal: to find a d-dimensional hyperplane, so that the data points to the square of the distance and the hyperplane minimum
optimize the target:
\[\begin{aligned} \mathop{\arg\min}_{w_1, \dots, w_d} \sum \limits_{k=1}^{n}||x_k - \tilde{x}_k||_2 \\ s.t. \quad w_i^Tw_j = \begin{cases} 1, i = j \\ 0, i \neq j \end{cases} \end{aligned}\]
\ (\ tilde {x} _k \) is the projection vector
Linear discriminant analysis
Dichotomous
Supervised dimensionality reduction method (LDA)
PCA algorithm does not take into account the data labels may result in not classifying the map
The central idea: to maximize the inter-class distance and the minimum distance classes
For dichotomous
类间tide Nori阵: \ (S_B = (\ Mu_1 - \ Mu_2) (\ Mu_1 - \ Mu_2) ^ T \)
Within-class scatter matrix: \ (S_w = \ SUM \ limits_ {X \} in C_i (X - \ mu_i) (X - \ mu_i) ^ T \)
optimize the target:
\[J(w) = \frac{w^T S_B w}{w^T S_w w} = \lambda\]
\ (S_w ^ {-. 1} S_Bw = \ the lambda W \) \ (J (W) \) corresponding to the matrix \ (S_w ^ {- 1} S_B \) largest eigenvalues, and the projection direction is the eigenvector corresponding feature vectors
Made strong for data distribution assumptions: Each data class is a Gaussian distribution, each class is equal to the covariance
Advantages: linear model better robustness against noise
Disadvantages: Simple model also assumed that the distribution can be processed by introducing a more complex kernel function data
LDA method of high-dimensional data having a plurality of class labels
Calculating average data set for each category \ (\ mu_j \) and overall mean \ (\ MU \)
Within the computing class scatter matrix \ (S_w \) , the overall scatter matrix \ (S_T \) , and the between-class scatter matrix to give \ (S_B = S_t - S_w \ )
For \ (S_w ^ {- 1} S_B \) matrix eigenvalue decomposition, the eigenvalues in descending order
D corresponding to large eigenvalues vector before taking \ (W_1, w_2, \ cdots, w_d \) , by converting the n-dimensional samples are mapped to the d-dimensional
\[x^{'}_i = \begin{bmatrix} w_1^{T}x_i \\ w_2^Tx_i \\ \cdots \\ w_d^Tx_i \end{bmatrix}\]
The new \ (x ^ { '} _ i \) of dimension d is \ (x_i \) in the d-th principal component \ (w_d \) projected in a direction
PCA and LDA differences and relations
Contact: solving process is very similar
the difference:
- Principia Mathematica
- optimize the target
- Scenario: using the PCA dimensionality reduction of unsupervised tasks, supervised the use of LDA.
- Extracting from the audio speech signal, the noise was filtered off with PCA
- Voiceprint identification, so that everyone with LDA sound signal having a discriminative