Mathematical Modeling--(10) Dimensionality Reduction Model

Table of contents

foreword

The role of data dimensionality reduction

1. Principal Component Analysis (PCA)

1 Introduction

2. Algorithm process

3. Explanation of Principal Component Analysis 

2. Factor Analysis (FA)

1 Introduction

2. Algorithm process

3. Comparison of factor analysis and principal component analysis


foreword

Here are three dimensionality reduction algorithms, and first introduce their respective characteristics.

  1. Principal component analysis is mainly to reduce the dimensionality of multiple indicators, and only keep a few indicators ;
  2. Factor analysis is better than principal component analysis, because factor analysis is easier to explain than principal component analysis , and it is not easy to explain after principal component analysis is used. The role of factor analysis is the same as that of principal component analysis;
  3. The role of canonical correlation analysis feels somewhat limited (compared to the above two algorithms), canonical correlation analysis is a multivariate statistical method for studying the correlation between two groups of variables, which can reveal the internal relationship between two groups of variables , that is to say, select an indicator from the two sets of data that contain multiple indicators to replace the data of your own set, so as to analyze the relationship between the two sets of data.

The role of data dimensionality reduction

  • Dimensionality reduction is to retain some of the most important features of high-dimensional data (too many indicators) , remove noise and unimportant features, so as to achieve the purpose of improving data processing speed.
  • In actual production and application, dimensionality reduction can save us a lot of time and cost within a certain range of information loss. Dimensionality reduction has also become a very widely used data preprocessing method.

Dimensionality reduction has some advantages :

  • Make the dataset easier to use;
  • Reduce the computational overhead of the algorithm;
  • remove noise;
  • make the results easy to understand.

1. Principal Component Analysis (PCA)

1 Introduction

        Principal component analysis is a dimensionality reduction algorithm, which can convert multiple indicators into a few principal components . These principal components are linear combinations of original variables and are not correlated with each other, which can reflect the size of the original data. partial information. Generally speaking, when the research problem involves multiple variables and there is a strong correlation between the variables, we can consider using the method of principal component analysis to simplify the data.
        Principal component analysis is a statistical analysis method that divides multiple variables into a few comprehensive indicators. From a mathematical point of view, this is a dimensionality reduction processing technique.


2. Algorithm process

  1. to standardize
  2. Compute the covariance matrix for standardized samples
  3. Calculate the eigenvalues ​​and eigenvectors of R
  4. Calculate the principal component contribution rate and cumulative contribution rate
  5. write out the principal components
  6. Analysis of the meaning represented by the principal components according to the coefficients
  7. Use the results of the principal components for subsequent analysis

 1. Standardize


2. Calculate the covariance matrix of the standardized sample

 


3. Calculate the eigenvalues ​​and eigenvectors of R


 4. Calculate the principal component contribution rate and cumulative contribution rate


 5. Write down the principal components


6. Analyze the meaning represented by the principal components according to the coefficients


 7. Use the results of the principal components for subsequent analysis


3. Explanation of Principal Component Analysis 

In principal component analysis, we should first ensure that the cumulative contribution rate         of the first few principal components extracted reaches a high level , and secondly, we must be able to give explanations that conform to the actual background and meaning of these extracted principal components.
        The meaning of the interpretation of principal components is generally somewhat vague, not as clear and precise as the meaning of the original variables, which is the price that has to be paid in the process of variable dimensionality reduction. Therefore, the number m of extracted principal components should usually be significantly smaller than the number p of the original variables (unless p itself is small), otherwise the "advantage" of dimensionality reduction may not be worth the "disadvantage" that the meaning of the principal components is not as clear as the original variables.
        If there is a high correlation between the original variables, the cumulative contribution rate of the first few principal components can usually reach a high level, that is to say, the cumulative contribution rate at this time is usually easier to meet.
        The difficulty of principal component analysis is mainly to be able to give a better explanation of the principal components. If one of the extracted principal components cannot be explained, the entire principal component analysis will fail.
        Principal component analysis is an important and commonly used method for variable dimensionality reduction. Simply put, the successful application of this method depends on the reasonable selection of original variables and "luck".


2. Factor Analysis (FA)

1 Introduction

  • Factor analysis was first proposed by Spearman in 1904, which can be regarded as the promotion and expansion of principal component analysis to some extent.
  • By studying the correlation coefficient matrix between variables , the factor analysis method summarizes the intricate relationship between these variables into a few comprehensive factors, because the number of factors attributed is less than the number of original variables, but they also contain the information of the original variables , so this analysis process is also called dimensionality reduction. Because factors are often easier to explain than principal components, factor analysis is more likely to succeed than principal component analysis, and thus has wider applications.
     

2. Algorithm process

  1. KMO and Bartlett's test
  2. Variance explained ratio table
  3. Table of Rotated Factor Loading Factors
  4. gravel diagram
  5. Supplementary Note: Factor Calculation Weight

  6. Component Score Coefficient Matrix

  7. load diagram
  8. Linear combination coefficients and weight results


1. KMO and Bartlett's test


 2. Variance explained rate table


3. Table of factor loading coefficients after rotation


 4. Gravel map


5. Supplementary explanation: factor calculation weight

6. Component Score Coefficient Matrix


 7. Load diagram

 8. Linear combination coefficient and weight results


3. Comparison of factor analysis and principal component analysis

Guess you like

Origin blog.csdn.net/qq_58602552/article/details/130432542