Matlab Principal Component Analysis

principal component analysis

This article will introduce Principal Component Analysis (PCA), which is a dimensionality reduction algorithm that converts multiple indicators into a small number of principal components, which are linear combinations of the original variables and are independent of each other. Correlation, which can reflect most of the information of the original data. Generally speaking, when the research problem involves multiple variables and there is a strong correlation between the variables, we can consider using the method of principal component analysis to simplify the data.

1. Introduction to Principal Component Analysis

Principal component analysis is a method of mathematical dimensionality reduction. It finds several comprehensive variables to replace many original variables, so that these comprehensive variables can represent the amount of information of the original variables as much as possible, and are not related to each other. This statistical analysis method, which converts multiple variables into a few unrelated comprehensive variables, is called principal component analysis.

2. The idea of ​​principal component analysis

Mathematical dimensionality reduction

insert image description here

3. Calculation steps of principal component analysis

  1. Normalize raw data
  2. Calculate the covariance matrix R for standardized samples
  3. Calculate the eigenvalues ​​and eigenvectors of R
  4. · Calculate the principal component contribution rate and cumulative contribution rate
  5. Write down the principal component expressions
  6. Analyze the meaning represented by the principal components according to the coefficients (it is very important to explain the meaning and role of the principal component variables)
  7. Use the results of the principal components for subsequent analysis
    insert image description here
    insert image description here
    insert image description here
    insert image description here
    insert image description here

insert image description here

4. Case Analysis

Import through excel or the provided data, establish matrix x, and then standardize matrix X. The next experiments are all operations on the standardized matrix.

 R = corrcoef(x);% 计算样本相关系数矩阵R 

insert image description here
insert image description here

code

insert image description here

clear;clc
 load data1.mat   % 主成分聚类
%  load data2.mat   % 主成分回归

% 注意,这里可以对数据先进行描述性统计
% 描述性统计的内容见第5讲.相关系数
[n,p] = size(x);  % n是样本个数,p是指标个数

%% 第一步:对数据x标准化为X
X=zscore(x);   % matlab内置的标准化函数(x-mean(x))/std(x)

%% 第二步:计算样本协方差矩阵
R = cov(X);

%% 注意:以上两步可合并为下面一步:直接计算样本相关系数矩阵
R = corrcoef(x);
disp('样本相关系数矩阵为:')
disp(R)

%% 第三步:计算R的特征值和特征向量
% 注意:R是半正定矩阵,所以其特征值不为负数
% R同时是对称矩阵,Matlab计算对称矩阵时,会将特征值按照从小到大排列哦
% eig函数 特征值,特征向量
[V,D] = eig(R);  % V 特征向量矩阵  D 特征值构成的对角矩阵


%% 第四步:计算主成分贡献率和累计贡献率
lambda = diag(D);  % diag函数用于得到一个矩阵的主对角线元素值(返回的是列向量)
lambda = lambda(end:-1:1);  % 因为lambda向量是从小大到排序的,我们将其调个头
contribution_rate = lambda / sum(lambda);  % 计算贡献率
cum_contribution_rate = cumsum(lambda)/ sum(lambda);   % 计算累计贡献率  cumsum是求累加值的函数
disp('特征值为:')
disp(lambda')  % 转置为行向量,方便展示
disp('贡献率为:')
disp(contribution_rate')
disp('累计贡献率为:')
disp(cum_contribution_rate')
disp('与特征值对应的特征向量矩阵为:')
% 注意:这里的特征向量要和特征值一一对应,之前特征值相当于颠倒过来了,因此特征向量的各列需要颠倒过来
%  rot90函数可以使一个矩阵逆时针旋转90度,然后再转置,就可以实现将矩阵的列颠倒的效果
V=rot90(V)';
disp(V)


%% 计算我们所需要的主成分的值
m =input('请输入需要保存的主成分的个数:  ');
F = zeros(n,m);  %初始化保存主成分的矩阵(每一列是一个主成分)
for i = 1:m
    ai = V(:,i)';   % 将第i个特征向量取出,并转置为行向量
    Ai = repmat(ai,n,1);   % 将这个行向量重复n次,构成一个n*p的矩阵
    F(:, i) = sum(Ai .* X, 2);  % 注意,对标准化的数据求了权重后要计算每一行的和
end

Guess you like

Origin blog.csdn.net/weixin_43599390/article/details/131358314