PCA-based fault diagnosis method (matlab)

1. PCA principle analysis

The principle of PCA is mainly to reduce the dimensionality of original data. For its specific working principle, please refer to:CodingLabs - Mathematical Principles of PCA

2. Data preprocessing

 The training data set (only positive samples) is X_{n*m} dimensional data, that is, there are n sample values, and each sample value has m features.

\begin{pmatrix} x_{11} &x_{12} & \cdots & x_{1m}\\ x_{21}& x_{22} &\cdots &x_{2m} \\ \vdots&\vdots &\ddots &\vdots \\ x_{n1}&x_{n2} &\cdots &x_{nm} \end{pmatrix}

2.1 Data normalization

Normalize the data X for each feature to have a mean of 0 and a root mean square of 1.

x_{i,j}^*=\frac{x_{i,j}-\bar{x_i}}{\sqrt{s_{i}}}

in:

\bar {x_i}=\frac{1}{n}\sum_{j=1}^n{x_{ji}},i=1,2,...,n

s_{i}=\frac{1}{n-1}\sum_{j=1}^n{(x_{ji}-\bar {x_i})^2}, i=1,2,...,n

3. PCA dimensionality reduction

3.1 First find the covariance matrix

The formula for the covariance matrix is:

R=\frac{1}{n-1}X^TX

The calculated covariance matrix is ​​a characteristic m*m dimensional matrix.

3.2 Find eigenvalues ​​and eigenvectors

Find the eigenvalues ​​and eigenvectors of the covariance matrix R, and arrange the eigenvalues ​​in order from large to small

\lambda_1\geq\lambda_2\geq\lambda_3\cdots\geq\lambda_m

After rearranging the eigenvectors according to their eigenvalues, we get:

P_{mm}=[p_1,p_2,\cdots,p_m]

3.3 Select appropriate k features for PCA dimensionality reduction

You can select the top k features whose cumulative eigenvalues ​​are greater than 85% for PCA dimensionality reduction.

\frac{\sum_{i=1}^k{\lambda_i}}{\sum_{i=1}^m{\lambda_i}}\geq 0.85

Let the first k eigenvalues ​​from large to small form a diagonal matrixS_{kk}, and the k corresponding eigenvectors will form a matrixP_{mk}. That is:

S_{kk}=diag(\lambda_1,\lambda_2,\cdots,\lambda_k)

P_{mk}=[p_1,p_2,\cdots,p_k]

After PCA dimensionality reduction, the number of samples is still n samples, but the number of features becomes k. The dimensionality reduction formula is:

\tilde{X}_{nk}=X_{nm}*P_{mk}

The calculation formula of the matrix of

X'=\tilde{X}_{nk}P_{mk}^T=X_{nm}P_{mk}P_{mk}^T

4. Find the statistical limit

4.1 T^2统计数

4.1.1 T^2Calculation formula of statistics:

T_{\alpha}=\frac{k(n^2-1)}{n(n-k)}F_{\alpha}(k,n-k)

where1-\alpha is the confidence level, F_{\alpha}(k,n-k) is an F distribution with the first degree of freedom being k and the second degree of freedom being n-k, usually a>\alpha takes 0.01.

Another important point to emphasize is: n is the number of samples in the training data set, and k is the number of features selected after PCA.

 4.1.2 Calculate test dataT^2statistics

Calculate theT^2 value of each sample value of the test data. Assume that a sample value in the test samplex_{new} is a sample value of 1*m. This sample value is also normalized by the mean and variance of the training sample (note that the mean and variance here use training The mean and variance of the sample instead of selecting the sample and variance of the training data), itsT^2calculation formula is:

T^2=x_{new(1m)}*P_{mk}*S_{kk}^{-1}*P_{mk}^T*x_{new(1m)}^T

In addition the calculation formula of T^2 can also be simplified as:

T^2=||S_{kk}^{-1/2}*P_{mk}^T*x_{new(1m)}^T||_2^2

whereS_{kk}^{-1/2} means that each element in the diagonal matrix takes -1/2 index, and ||\cdot ||_2^2 means the square of the 2 norm.

4.1.3 Fault determination

If the system is running normally, the sample's T^2 value should satisfy TT^2<T_{\alpha}, otherwise it is considered to be a failure.

4.2 SPE statistics (also called Q statistics)

4.2.1 Calculation of SPE control quantity limits

Q_{\alpha}=\theta_1[\frac{c_{\alpha}h_0\sqrt{2\theta_2}}{\theta_1}+1+\frac{\theta_2h_0(h_0-1)}{\theta_1^2}]^{1/h_0}

in:

\theta_r=\sum_{j=k+1}^m{\lambda_j^r)}, r=1,2,3

h_0=1-\frac{2\theta_1\theta_3}{3\theta_2^2}

c_{\alpha}is the confidence limit of the standard normal distribution.

4.2.2 Calculate SPE value of test data

Test data selection and calculationT^2The same sampling valuex_{new(1m)}, also undergo the same normalization process.

Q=x_{new(1m)}*(I_{mm}-P_{mk}P_{mk}^T)x_{new(1m)}^T

4.3.3 Determine whether a fault occurs

 If the system is running normally, the SPE value of the sample should satisfy Q<Q_{\alpha}, otherwise, a failure can be determined.

5. matlab implementation

clc;clear;
%% 1.导入数据
%产生训练数据
num_sample=100;
a=10*randn(num_sample,1);
x1=a+randn(num_sample,1);
x2=1*sin(a)+randn(num_sample,1);
x3=5*cos(5*a)+randn(num_sample,1);
x4=0.8*x2+0.1*x3+randn(num_sample,1);
xx_train=[x1,x2,x3,x4];

% 产生测试数据
a=10*randn(num_sample,1);
x1=a+randn(num_sample,1);
x2=1*sin(a)+randn(num_sample,1);
x3=5*cos(5*a)+randn(num_sample,1);
x4=0.8*x2+0.1*x3+randn(num_sample,1);
xx_test=[x1,x2,x3,x4];
xx_test(51:100,2)=xx_test(51:100,2)+15*ones(50,1);

%% 2.数据处理
Xtrain=xx_train;
Xtest=xx_test;
X_mean=mean(Xtrain);
X_std=std(Xtrain);
[X_row, X_col]=size(Xtrain);
Xtrain=(Xtrain-repmat(X_mean,X_row,1))./repmat(X_std,X_row,1); %标准化处理

%% 3.PCA降维
SXtrain = cov(Xtrain);%求协方差矩阵
[T,lm]=eig(SXtrain);%求特征值及特征向量,特征值排列顺序为从小到大
D=flipud(diag(lm));%将特征值从大到小排列
% 确定降维后的数量
num=1;
while sum(D(1:num))/sum(D)<0.85
    num = num+1;
end
P = T(:,X_col-num+1:X_col); %取对应的向量
P_=fliplr(P); %特征向量由大到小排列


%% 4.计算T2和Q的限值
%求置信度为99%时的T2统计控制限,T=k*(n^2-1)/n(n-k)*F(k,n-k)
%其中k对应num,n对应X_row
T2UCL1=num*(X_row-1)*(X_row+1)*finv(0.99,num,X_row - num)/(X_row*(X_row - num));%求置信度为99%时的T2统计控制限 

%求置信度为99%的Q统计控制限
for i = 1:3
    th(i) = sum((D(num+1:X_col)).^i);
end
h0 = 1 - 2*th(1)*th(3)/(3*th(2)^2);
ca = norminv(0.99,0,1);
QU = th(1)*(h0*ca*sqrt(2*th(2))/th(1) + 1 + th(2)*h0*(h0 - 1)/th(1)^2)^(1/h0); %置信度为99%的Q统计控制限 

%% 5.模型测试
n = size(Xtest,1);
Xtest=(Xtest-repmat(X_mean,n,1))./repmat(X_std,n,1);%标准化处理
%求T2统计量,Q统计量
[r,y] = size(P*P');
I = eye(r,y); 
T2 = zeros(n,1);
Q = zeros(n,1);
lm_=fliplr(flipud(lm));
%T2的计算公式Xtest.T*P_*inv(S)*P_*Xtest
for i = 1:n
    T2(i)=Xtest(i,:)*P_*inv(lm_(1:num,1:num))*P_'*Xtest(i,:)';    
    Q(i) = Xtest(i,:)*(I - P*P')*Xtest(i,:)';                                                                                    
end

%% 6.绘制T2和SPE图
figure('Name','PCA');
subplot(2,1,1);
plot(1:i,T2(1:i),'k');
hold on;
plot(i:n,T2(i:n),'k');
title('统计量变化图');
xlabel('采样数');
ylabel('T2');
hold on;
line([0,n],[T2UCL1,T2UCL1],'LineStyle','--','Color','r');

subplot(2,1,2);
plot(1:i,Q(1:i),'k');
hold on;
plot(i:n,Q(i:n),'k');
title('统计量变化图');
xlabel('采样数');
ylabel('SPE');
hold on;
line([0,n],[QU,QU],'LineStyle','--','Color','r');

%% 7.绘制贡献图
%7.1.确定造成失控状态的得分
S = Xtest(51,:)*P(:,1:num);
r = [ ];
for i = 1:num
    if S(i)^2/lm_(i) > T2UCL1/num
        r = cat(2,r,i);
    end
end
%7.2.计算每个变量相对于上述失控得分的贡献
cont = zeros(length(r),X_col);
for i = length(r)
    for j = 1:X_col
        cont(i,j) = abs(S(i)/D(i)*P(j,i)*Xtest(51,j));
    end
end
%7.3.计算每个变量的总贡献
CONTJ = zeros(X_col,1);
for j = 1:X_col
    CONTJ(j) = sum(cont(:,j));
end
%7.4.计算每个变量对Q的贡献
e = Xtest(51,:)*(I - P*P');%选取第60个样本来检测哪个变量出现问题。
contq = e.^2;
%5. 绘制贡献图
figure
subplot(2,1,1);
bar(contq,'g');
xlabel('变量号');
ylabel('SPE贡献率 %');
hold on;
subplot(2,1,2);
bar(CONTJ,'r');
xlabel('变量号');
ylabel('T^2贡献率 %');

 The training data is x1, x2, x3 and x4 created by yourself, where x4 is a variable related to x2 and x3. The test data is exactly the same as the training data, except that a fault is added to x2 after the 50th data.

The result obtained is as follows:

From the above figure, it can be clearly seen that the test data starts from the 50th data, and both T2 and SPE values ​​exceed the limit, proving that a fault has occurred.​ 

 Through contribution diagram analysis, it can be seen that variable 2 is the fault occurrence point, which is consistent with the actual situation. There are more granular approaches to where failures occur. Not going into details here.

Reference:Fault diagnosis method for linear supervised classification based on PCA-calculation of T2 and SPE statistics_And_ZJ’s blog-CSDN blog_spe statistics

Guess you like

Origin blog.csdn.net/tangxianyu/article/details/124135933