Dimensionality reduction basis

Dimensionality reduction: the number of features, confusing concepts: dimension: the dimension of the array

1. Feature selection 

What is it? Select some features as the final analysis data

Reason: Redundancy: Some features are highly correlated and computationally expensive

           Noise: Some features have an impact on the prediction results

Main method: Filter (filtered)

                  Embedded: regularization, decision tree

                  Wrapper (wrapping type), not very useful

Delete local differences: delete similar features.

sklearn Principal component analysis

PCA (Principal Component Analysis): When the number of features reaches hundreds, consider data simplification.

The data will also change and the number of features will decrease.

Feature 1          
1          
2          
3          
4          
5          
6          
7          
8          

The above situation occurs when the number of features is large, and PCA is needed at this time.

Dimensionality reduction case 1:

instacart

Divide users into multiple categories-user-purchased item categories

Analyze the data, only feature values, no target values

Feature selection and PCA: PCA is used when the dimension is several hundred

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_38851184/article/details/108542068