PCA dimensionality reduction (python+matlab)

principle

https://baike.baidu.com/item/pca%E6%8A%80%E6%9C%AF/10408698?fr=aladdin
Note: Python data is in .csv format, and matlab data is in .mat format.
Currently loading the data on csdn, I will not do it. If you need data, leave a message.

python code

After pca dimensionality reduction, because the first two groups of data account for a large proportion of the overall data information, the algorithm uses the first two groups of data after dimensionality reduction.

Basic knowledge part:
Loading data : You can refer to
https://blog.csdn.net/weixin_42567027/article/details/107302214
x = data.iloc[:,0:-1] : You can refer to
https://blog.csdn. net/weixin_42567027/article/details/107227321

// An highlighted block
import pandas as pd
from sklearn.decomposition import PCA


if __name__ == '__main__':

    '''加载数据'''
    data = pd.read_csv('F:\pythonlianxi\iris.csv', header=None)
    x = data.iloc[:,0:-1]

    '''pca降维'''
    #输出的方差  可以结合pca降维的原理来理解
    pca = PCA(n_components=2, whiten=True, random_state=0)
    #利用PCA降维技术对数据进行某种统一处理(比如标准化~N(0,1),将数据缩放(映射)到某个固定区间,归一化,正则化等)
    x = pca.fit_transform(x)
    print(x)
    #由于PCA将数据重新投影,得到两个两个垂直方向的重新投影
    print( '各方向方差:', pca.explained_variance_)
    print ('方差所占比例:', pca.explained_variance_ratio_)

matlab

CRate is the contribution rate, that is, the proportion of the selected column to the information of all columns. Generally, it is good to select the selected data above 0.80. Of course, you can directly select the first few columns without using specific gravity, and the algorithm is modified to: T=V(:,1:2), and the first two columns of data are selected.

// An highlighted block
%每行是一个样本
%PCA1  降维后的新矩阵
%T 变换矩阵
%meanValue  X每列均值构成的矩阵,用于将降维后的矩阵newX恢复成X
%CRate 贡献率
%计算中心化样本矩阵

CRate=0.95
meanValue=ones(size(y,1),1)*mean(y);
y1=y-meanValue;%每个维度减去该维度的均值
C=y1'*y1/(size(y1,1)-1);%计算协方差矩阵

%求协方差的特征值和特征向量,特征值
[V,D]=eig(C);
%将特征向量按降序排序
[dummy,order]=sort(diag(-D));
V=V(:,order);%将特征向量按照特征值大小进行降序排列
d=diag(D);%将特征值取出,构成一个列向量
newd=d(order);%将特征值构成的列向量按降序排列

%取前n个特征向量,构成变换矩阵
sumd=sum(newd);%特征值之和
for j=1:length(newd)
    i=sum(newd(1:j,1))/sumd;%计算贡献率,贡献率=前n个特征值之和/总特征值之和
    if i>CRate%当贡献率大于95%时循环结束,并记下取多少个特征值
        cols=j;
        break;
    end
end
T=V(:,1:cols);%取前cols个特征向量,构成变换矩阵T
PCA1=y1*T;%用变换矩阵TX进行降维

Guess you like

Origin blog.csdn.net/weixin_42567027/article/details/107418146