Principal component analysis (PCA) and Linear Discriminant Analysis (LDA)

Principal component analysis

  • Linear, unsupervised, global dimensionality reduction algorithm

PCA theoretical maximum variance

  • Starting point: In the field of signal processing, a signal having a large variance, the noise has a smaller variance

  • Goal: to maximize the projected variance, so that the maximum variance of the data in the main projection direction

  • PCA solution method:

    • The sample data processing center

    • Seeking sample covariance matrix

    • Covariance matrix eigenvalue decomposition, the eigenvalues ​​in descending order

    • D corresponding to large eigenvalues vector before taking \ (W_1, w_2, \ cdots, w_d \) , by converting the n-dimensional samples are mapped to the d-dimensional

      \[x^{'}_i = \begin{bmatrix} w_1^{T}x_i \\ w_2^Tx_i \\ \cdots \\ w_d^Tx_i \end{bmatrix}\]

      The new \ (x ^ { '} _ i \) of dimension d is \ (x_i \) in the d-th principal component \ (w_d \) projected in a direction

  • limitation:

    • Linear dimension reduction
    • Kernel Principal Component obtained by nuclear PCA extended mapping analysis (on KPCA)

PCA least square error theory

  • Departure goal: to find a d-dimensional hyperplane, so that the data points to the square of the distance and the hyperplane minimum

  • optimize the target:

    \[\begin{aligned} \mathop{\arg\min}_{w_1, \dots, w_d} \sum \limits_{k=1}^{n}||x_k - \tilde{x}_k||_2 \\ s.t. \quad w_i^Tw_j = \begin{cases} 1, i = j \\ 0, i \neq j \end{cases} \end{aligned}\]

    \ (\ tilde {x} _k \) is the projection vector

Linear discriminant analysis

Dichotomous

  • Supervised dimensionality reduction method (LDA)

  • PCA algorithm does not take into account the data labels may result in not classifying the map

  • The central idea: to maximize the inter-class distance and the minimum distance classes

  • For dichotomous

    • 类间tide Nori阵: \ (S_B = (\ Mu_1 - \ Mu_2) (\ Mu_1 - \ Mu_2) ^ T \)

    • Within-class scatter matrix: \ (S_w = \ SUM \ limits_ {X \} in C_i (X - \ mu_i) (X - \ mu_i) ^ T \)

    • optimize the target:

      \[J(w) = \frac{w^T S_B w}{w^T S_w w} = \lambda\]

    • \ (S_w ^ {-. 1} S_Bw = \ the lambda W \) \ (J (W) \) corresponding to the matrix \ (S_w ^ {- 1} S_B \) largest eigenvalues, and the projection direction is the eigenvector corresponding feature vectors

  • Made strong for data distribution assumptions: Each data class is a Gaussian distribution, each class is equal to the covariance

  • Advantages: linear model better robustness against noise

  • Disadvantages: Simple model also assumed that the distribution can be processed by introducing a more complex kernel function data

LDA method of high-dimensional data having a plurality of class labels

  • Calculating average data set for each category \ (\ mu_j \) and overall mean \ (\ MU \)

  • Within the computing class scatter matrix \ (S_w \) , the overall scatter matrix \ (S_T \) , and the between-class scatter matrix to give \ (S_B = S_t - S_w \ )

  • For \ (S_w ^ {- 1} S_B \) matrix eigenvalue decomposition, the eigenvalues in descending order

  • D corresponding to large eigenvalues vector before taking \ (W_1, w_2, \ cdots, w_d \) , by converting the n-dimensional samples are mapped to the d-dimensional

    \[x^{'}_i = \begin{bmatrix} w_1^{T}x_i \\ w_2^Tx_i \\ \cdots \\ w_d^Tx_i \end{bmatrix}\]

    The new \ (x ^ { '} _ i \) of dimension d is \ (x_i \) in the d-th principal component \ (w_d \) projected in a direction

PCA and LDA differences and relations

  • Contact: solving process is very similar

  • the difference:

    • Principia Mathematica
    • optimize the target
    • Scenario: using the PCA dimensionality reduction of unsupervised tasks, supervised the use of LDA.
      • Extracting from the audio speech signal, the noise was filtered off with PCA
      • Voiceprint identification, so that everyone with LDA sound signal having a discriminative

Guess you like

Origin www.cnblogs.com/weilonghu/p/11922361.html