1. Use principal components for feature dimensionality reduction

Problem description: For a given set of features, reduce the number of features while retaining the amount of information

solution:

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets

digits = datasets.load_digits()
#标准化特征矩阵
features = StandardScaler().fit_transform(digits.data)
#创建可以保留99%信息量(用方差表示)的PCA
pca = PCA(n_components=0.99,whiten = True)
#执行PCA
features_pca = pca.fit_transform(features)
#显示结果
print("Original number of features:",features.shape[1])
print("Reduced number of features:",features_pca.shape[1])
#output:
Original number of features: 64
Reduced number of features: 54

The output of the solution shows that PCA retains 99% of the feature matrix information while reducing the number of features by 10.

Principal Component Analysis (PCA) is a popular method of linear dimensionality reduction. PCA maps the sample data to the principal component space of the feature matrix (the principal component space retains most of the data differences and generally has a lower dimension). PCA is an unsupervised learning method, that is, it only considers the feature matrix and does not need the information of the target vector.

In sklearn, PCA is implemented by the pca method.

The parameter n_components has two meanings, which are determined by the specific parameter value.
- If the value is greater than 1, it will return the same number of features as this value.
- Between 0 and 1, pca returns the minimum number of features with a certain amount of information. Usually the value is 0.95 or 0.99.
The parameter whiten = True means that each principal component is transformed to ensure that their mean value is 0 and variance is 1.
The parameter svd_solver = "randomized", which means that the first principal component is found using a random method.

2. Feature reduction for linear inseparable data

Solution: Use kernel PCA (Kernel Principal Component Analysis, an extension of PCA) for nonlinear dimensionality reduction.

from sklearn.decomposition import PCA,KernelPCA
from sklearn.datasets import make_circles

#创建线性不可分数据
features,_ = make_circles(n_samples = 1000,random_state = 1,noise =0.1,factor = 0.1)
#应用基于径向基函数(RBF)核的Kernel PCA方法
kpca = KernelPCA(kernel = "rbf",gamma = 15,n_components =1)
features_kpca = kpca.fit_transform(features)

#显示结果
print("Original number of features:",features.shape[1])
print("Reduced number of features:",features_kpca.shape[1])
#output:
Original number of features: 2
Reduced number of features: 1

For linearly inseparable data, if linear PCA is used to reduce the dimensionality of the data, the two types of data will be linearly attracted to the first principal component, and thus will be intertwined. Ideally, it is hoped that the dimensional transformation can not only reduce the dimensionality of the data, but also make the data linearly separable.

The kernel (kernel, also called kernel function) can map linearly indivisible data to a higher dimension, and the data is linearly separable in this dimension. This method is called a kernel trick. The commonly used kernel function is Gaussian radial basis function (rbf), and other kernel functions are polynomial kernel (poly) and sigmoid kernel.

3. Feature dimensionality reduction by maximizing the separability between classes

Solution: Use Linear Discriminant Analysis (LDA) method to map the feature data to a component axis that can maximize the separability between classes.

from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# 加载Iris flower数据集
iris = datasets.load_iris()
features = iris.data
target = iris.target
#创建并运行LDA，然后使用它对特征做变换
lda = LinearDiscriminantAnalysis(n_components = 1)
features_lad = lda.fit(features,target).transform(features)
#打印特征的数量
print("Original number of features:",features.shape[1])
print("Reduced number of features:",features_lad.shape[1])
#output:
Original number of features: 4
Reduced number of features: 1

lda.explained_variance_ratio_
#output:
array([0.9912126])

LAD is a classification method and a commonly used method of dimensionality reduction. The principle of LDA is similar to that of PCA, which maps the feature space to a lower-dimensional space. However, in PCA, you only need to focus on the component axis that maximizes the difference in data; in LDA, another goal is to find the component axis that maximizes the difference between classes.

4. Use matrix factorization for feature dimensionality reduction

Problem description: dimensionality reduction of non-negative feature matrix

Solution: Use Non-Negative Matrix Factorization (NMF) to reduce the dimension of the feature matrix.

from sklearn.decomposition import NMF
from sklearn import datasets
#加载数据
digits = datasets.load_digits()
#加载特征矩阵
features = digits.data
#创建NMF，进行变换并应用
nmf = NMF(n_components = 10,random_state = 1)
features_nmf = nmf.fit_transform(features)
#显示结果
print("Original number of features:",features.shape[1])
print("Reduced number of features:",features_nmf.shape[1])

#output:
Original number of features: 64
Reduced number of features: 10

NMF is an unsupervised linear dimensionality reduction method that can decompose the matrix (decompose the characteristic matrix into multiple matrices, the product of which is similar to the original matrix), and convert the characteristic matrix into a matrix that represents the potential relationship between the sample and the feature.

If you want to use NMA, the feature matrix cannot contain negative values. In addition, unlike PCA and other previously discussed technologies, NMA does not tell us that the amount of information in the original data is retained in the output features. Therefore, the optimization method to find the optimal value of the parameter n_components is to continuously try a series of possible values until the value that can generate the best learning model is found.

5. Feature dimensionality reduction on sparse matrix

Solution: Use Truncated Singular Value Decomposition (TSVD) method:

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import csr_matrix
from sklearn import datasets
import numpy as np
#加载数据
digits = datasets.load_digits()
#标准化特征矩阵
features = StandardScaler().fit_transform(digits.data)
#生成稀疏矩阵
features_sparse = csr_matrix(features)
#创建tsvd
tsvd = TruncatedSVD(n_components = 10)
#在稀疏矩阵上执行TSVD
features_sparse_tsvd = tsvd.fit(features_sparse).transform(features_sparse)
#显示结果
print("Original number of features:",features_sparse.shape[1])
print("Reduced number of features:",features_sparse_tsvd.shape[1])

#output:
Original number of features: 64
Reduced number of features: 10

Feature extraction for feature dimensionality reduction