Dimensionality reduction: the number of features, confusing concepts: dimension: the dimension of the array
1. Feature selection
What is it? Select some features as the final analysis data
Reason: Redundancy: Some features are highly correlated and computationally expensive
Noise: Some features have an impact on the prediction results
Main method: Filter (filtered)
Embedded: regularization, decision tree
Wrapper (wrapping type), not very useful
Delete local differences: delete similar features.
sklearn Principal component analysis
PCA (Principal Component Analysis): When the number of features reaches hundreds, consider data simplification.
The data will also change and the number of features will decrease.
Feature 1 | |||||
1 | |||||
2 | |||||
3 | |||||
4 | |||||
5 | |||||
6 | |||||
7 | |||||
8 |
The above situation occurs when the number of features is large, and PCA is needed at this time.
Dimensionality reduction case 1:
instacart
Divide users into multiple categories-user-purchased item categories
Analyze the data, only feature values, no target values
Feature selection and PCA: PCA is used when the dimension is several hundred