Mathematical modeling of the principal component analysis

First, the definition
of principal component analysis (the Principal the Analysis the Component, the PCA ), is a statistical method. Orthogonal transform by the presence of a variable set of possible correlation variables into a set of linearly independent, the set of the converted variables called principal components.
In practical topics in order to fully analyze the problem, often raised many variables relating thereto (or factors), because each variable to reflect some information about this subject in varying degrees.
Principal component analysis was first introduced by K. Pearson (Karl Pearson) non-random variables introduced later H. Hotelling this method is extended to the case of the random vector. Size information is often used to measure the sum of squares or variance.
- Principal Component Analysis "Baidu Encyclopedia" ""

Second, the application of
principal component analysis as the basis of mathematical analysis, its practical application is very extensive, such as demographics, the number of geography, molecular dynamics simulation, mathematical modeling, mathematical analysis, lottery school, portfolio and risk management disciplines both applications, is a commonly used multivariate analysis methods.

The cluster analysis and principal component analysis, discriminant analysis and regression analysis combined with more solve practical problems. BP neural network is usually combined with mathematical modeling, in order to solve some practical problems.

Third, the principal component analysis program matlab

function PCA()
% originalData=[83.0 89.7 81.7 86.0 82.7 89.0 95.0 86.7 83.5;
% 81.3 81.7 90.0 86.5 72.7 86.5 80.0 80.0 76.0;
% 90.7 89.7 93.7 87.5 89.3 96.0 80.0 84.0 81.5 ;
% 86.0 89.7 90.0 89.5 87.0 94.5 95.0 82.8 79.5 ;
% 81.7 90.3 93.0 90.5 90.5 89.5 95.0 88.5 84.5 ;
% 86.0 85.7 87.0 90.5 84.8 92.5 95.0 82.9 78.5 ;
% 82.7 81.7 90.0 80.2 91.0 84.0 90.0 81.8 81.5 ;
% 84.0 85.7 67.0 82.2 90.0 93.5 90.0 85.3 77.0 ;
% 81.3 79.7 94.0 89.0 88.0 89.0 85.0 80.7 71.0 ;
% 80.0 88.7 82.0 91.5 92.3 94.5 85.0 85.8 73.0 ;
% 86.0 84.0 92.0 92.0 87.0 93.5 80.0 84.2 74.0 ;
% 84.5 89.3 82.5 75.0 81.5 93.0 95.0 76.2 77.5 ;
% 78.7 82.8 90.0 92.0 84.0 88.5 95.0 90.0 75.3 ;
% 86.3 88.2 93.5 91.5 87.5 93.5 95.0 84.8 79.8 ;
% 86.5 87.0 89.5 88.0 90.3 93.5 85.0 82.0 86.0 ;
% 83.3 89.7 83.0 81.0 84.0 96.0 90.0 77.8 76.0 ;
% 79.5 63.0 67.0 62.0 89.5 74.0 95.0 83.3 69.0 ;
% 67.0 76.0 85.0 69.0 86.3 71.5 85.0 84.0 69.5 ;
% 86.0 78.3 91.5 89.5 85.0 84.0 90.0 78.5 82.0 ;
% 87.0 83.3 97.0 90.5 81.7 94.0 95.0 86.8 78.0 ;
% 78.3 68.3 88.5 80.0 85.7 72.5 85.0 84.8 72.5 ;];

originalData=[3523.16,5437.47,8443.84,9649.7;
    795.15,1698.91,3067.12,4138.25;
    45.82,103.56,160.91,256.43;
    22.62,51.02,68.99,65.07;
    214,221,229,223;
    7465,16000,18663,18825;];
originalData=originalData';%将矩阵进行转置,使得不同项目的同一指标值在同一列上

[roll,colomn]=size(originalData);%数据同方向化

%数据标准化:standardData-标准化后的数据;mu-每列的均值;sigma-每列的标准差
[standardData,mu,sigma]=zscore(originalData);
standardData
xiefangcha=cov(standardData);
[V,B]=eig(xiefangcha)   %求矩阵的特征值和特征向量,其中B的对角线元素是特征值,X的列是相应的特征向量

%PCA降维,调用princomp函数(或pca函数,matlab2018a中已不支持princomp函数,或不进行函数调用)
%coef即系数矩阵
%score即生成的n维加工后的数据存在score里。它是对原始数据进行的解析,进而在新的坐标系下获得的数据
%latent即一维列向量
[coeff,score,latent] = pca(standardData);
latent=100*latent/sum(latent)
A=length(latent);
percent_threshold=85;           %百分比阀值,用于决定保留的主成分个数;
percents=0;                     %累积百分比
for n=1:A
    percents=percents+latent(n);
    if percents>percent_threshold
        break;
    end
end
c_rate=cumsum(latent)./sum(latent)  %主成分贡献率
%coeff=coeff(:,1:n)                 %达到主成分累积影响率要求的系数矩阵;
%score=score(:,1:n)                 %达到主成分累积影响率要求的主成分;
/*该matlab代码仍略有小问题*/

function y=PCA2(originalData)
[n,m]=size(originalData);%数据的同方向性

%样本数据进行预处理,即对数据进行标准化,等价于standardData=zscore(originalData);
standardData=(originalData-ones(n,m)*diag(mean(originalData)))*diag(1./std(originalData));

%求出协方差矩阵
R=corrcoef(standardData); %m*m方阵,等价于R=cov(standardData);
[COEFF,latent,explained]=pcacov(R);%其中explained是指每个特征值占比,explained=100*latent/sum(latent);
%f=explained./100;
%求R的特征值和特征向量,其中D是特征值,C是特征向量,类似的有[COEFF,latent,explained] = pcacov(R);
[C,D]=eig(R)
%m是所有特征值的和,diag(D)返回矩阵D的主对角线上的元素
f=diag(D)/m;%所有主成分的权重
f=f'
%F=X*C的成分值的大小
F=standardData*C;
%计算样本评价值
y=sum((F*diag(f))')';
y=sortrows([(1:n)',y],2);
y=y(n:-1:1,:);
y=[(1:n)',y]
%计算特征值的累积贡献率
percent_threshold=85;%百分比阀值,用于决定保留的主成分个数;
percents=0;%累积百分比
for i=1:m
    percents=percents+f(i);
    if percents>percent_threshold
        break;
    end
end
c_rate=cumsum(f)./sum(f)  %主成分贡献率

Fourth, Links
(1) Principal component analysis PCA matlab realize
https://www.cnblogs.com/lutaitou/p/5535171.html

(2) Principal component analysis entry
https://www.cnblogs.com/SCUJIN/p/5965946.html

(3) Matlab function pcacov principal component analysis Analysis Code
https://blog.csdn.net/huangzhywin/article/details/89315143

(4) matlab principal component analysis function princomp profile
https://blog.csdn.net/fireguard/article/details/38701211

(5) matlab normalization and denormalized --zscore
https://blog.csdn.net/xiaopihaierletian/article/details/54138232

Guess you like

Origin blog.csdn.net/JxufeCarol/article/details/94475833