It is mainly the realization of an idea in the watermelon book, and does not involve the PCA principle and formula derivation. Summarize PCA in one sentence, in R d R^{d}RThe m points in d are mapped to R d ′ R^{d'}Rd′ space, and guaranteed ′ < d d'<dd′<d , whered ′ d'd' is the new dimension.
Expressed in a matrix: Z d ′ ∗ m = W d ′ ∗ d T ∗ X d ∗ m Z_{d'*m}=W^{T}_{d'*d} * X_{d*m}Zd′∗m=Wd′∗dT∗Xd∗m
The subscript is the number of rows and columns of the matrix, which is just the opposite of the reality; Z is the matrix after sample conversion, X is the original matrix of the sample, and W is the conversion (projection) matrix.
Article Directory
1. Ideas
Teacher Zhou Zhihua "Machine Learning" P231
2. Code implementation
It is mainly implemented through sklearn, which is relatively simple
def get_pca(X,threshold):
pca=PCA()
pca.fit(X)
variance_ratio=pca.explained_variance_ratio_
s=0
for i in range(len(variance_ratio)):
s=s+variance_ratio[i]
if s>=threshold:
break
new_dim=i+1
components=pca.components_
change_matrix=components[0:new_dim,:]
norm_X=X-np.mean(X,axis=0)
X_pca=np.matmul(norm_X,change_matrix.T)
return X_pca
Later, I used PCA(n_components=new_dim).fit_transform(X) to compare the results of this function. The main reason is that the precision of the values is different, and the results are all correct.
3. Supplementary information
For the principle, you can check out this blog Principal Component Analysis (PCA) Principle Detailed Explanation
In addition, the second chapter of Huashu and the tenth chapter of Watermelon Book have more detailed explanations of principles and derivation of formulas.