How to determine the dimensionality of PCA dimensionality reduction

It is mainly the realization of an idea in the watermelon book, and does not involve the PCA principle and formula derivation. Summarize PCA in one sentence, in R d R^{d}RThe m points in d are mapped to R d ′ R^{d'}Rd space, and guaranteed ′ < d d'<dd<d , whered ′ d'd' is the new dimension.

Expressed in a matrix: Z d ′ ∗ m = W d ′ ∗ d T ∗ X d ∗ m Z_{d'*m}=W^{T}_{d'*d} * X_{d*m}Zdm=WddTXdm
The subscript is the number of rows and columns of the matrix, which is just the opposite of the reality; Z is the matrix after sample conversion, X is the original matrix of the sample, and W is the conversion (projection) matrix.

1. Ideas

insert image description here
Teacher Zhou Zhihua "Machine Learning" P231

2. Code implementation

It is mainly implemented through sklearn, which is relatively simple

def get_pca(X,threshold):
    pca=PCA()
    pca.fit(X)
    variance_ratio=pca.explained_variance_ratio_
    s=0
    for i in range(len(variance_ratio)):
        s=s+variance_ratio[i]
        if s>=threshold:
            break
    new_dim=i+1
    components=pca.components_
    change_matrix=components[0:new_dim,:]
    norm_X=X-np.mean(X,axis=0)
    X_pca=np.matmul(norm_X,change_matrix.T)

    return X_pca

Later, I used PCA(n_components=new_dim).fit_transform(X) to compare the results of this function. The main reason is that the precision of the values ​​​​is different, and the results are all correct.

3. Supplementary information

For the principle, you can check out this blog Principal Component Analysis (PCA) Principle Detailed Explanation

In addition, the second chapter of Huashu and the tenth chapter of Watermelon Book have more detailed explanations of principles and derivation of formulas.

Guess you like

Origin blog.csdn.net/u012949658/article/details/117295311