Feature Dimensionality Reduction Algorithm—Mean Influence Value Algorithm (MIV) Free MATLAB Code Acquisition, Western Reserve University Data as an Example

1. Principle overview    

        As we all know, the commonly used feature dimensionality reduction methods include principal component analysis, factor analysis, and mean influence method. The mean influence value algorithm (MIV) is one of the best methods for neural networks to reduce the dimensionality of input variables.

        In the practical application of the neural network model, because there is no clear theory to determine the input variables, it is difficult to determine the input neurons of the network. If some unimportant independent variables are mixed in the input variables of the neural network, it will not only increase the training time of the model, but also reduce the accuracy of the model. Therefore, it is of great significance to screen out network input variables that have a large influence on the improvement of neural networks.

        The mean influence value (MIV) is an index to evaluate the degree of influence of the input independent variable on the output variable. The positive or negative of MIV indicates the direction of influence of the independent variable on the output variable, and its absolute value indicates the degree of influence. MIV constructs two new training samples by adding or subtracting 10% to the characteristic index value of the independent variable, calculates its influence change value (IV) on the output, and then averages the influence change value (IV) to obtain the MIV value of the independent variable , and finally repeat the above steps for each independent variable to obtain the MIV value of each independent variable, and sort them according to the degree of influence of the absolute value on each variable.

2. Experimental part

        Taking the bearing data of Western Reserve University as an example, calculate the mean value, variance, peak value, kurtosis, effective value, peak factor, impulse factor, shape factor, and margin factor of the bearing data, a total of nine indicators are used as the input neurons of the neural network. Through MIV value calculation, several indicators with important feature importance are screened out to achieve feature dimensionality reduction and improve diagnostic accuracy.

        There are many functions implemented by the code, including sliding window processing of raw data , feature value calculation, neural network data preparation, MIV value calculation, and diagnostic results after feature screening . You can take what you need. If you feel that the background code is bloated, you can leave a message. I can separate the code and post a few articles for your reference.

    First upload the code result:

    MIV value calculation results:

e2c73a46b6efe1bedcfcdb52dabdccb3.png

         The first four important features after screening are used as the neuron input of the neural network, and the results after being brought into the neural network are as follows:

ac519abe211f03b8b3782d1401fff09b.png

Next comes the dinner:

    The first is the sliding window data processing code. There are 10 states in total. Each state has 120 sets of samples. The data size of each sample is: 1×2048. A total of 10 kinds of fault data at the motor speed of 1797 are selected. conduct experiment. Collect the data in a data variable.

clear
clc
addpath(genpath(pwd));
%DE是驱动端数据 FE是风扇端数据 BA是加速度数据 选择其中一个就行
load 97.mat  %正常
load 105.mat  %直径0.007英寸,转速为1797时的  内圈故障
load 118.mat   %直径0.007,转速为1797时的  滚动体故障
load 130.mat  %直径0.007,转速为1797时的  外圈故障
load 169.mat   %直径0.014英寸,转速为1797时的  内圈故障
load 185.mat    %直径0.014英寸,转速为1797时的  滚动体故障
load 197.mat    %直径0.014英寸,转速为1797时的  外圈故障
load 209.mat   %直径0.021英寸,转速为1797时的  内圈故障
load 222.mat  %直径0.021英寸,转速为1797时的  滚动体故障
load 234.mat  %直径0.021英寸,转速为1797时的 外圈故障
% 一共是10个状态,每个状态有120组样本,每个样本的数据量大小为:1×2048
w=1000;                  % w是滑动窗口的大小1000
s=2048;                  % 每个故障表示有2048个故障点
m = 120;            %每种故障有120个样本
D0=[];
for i =1:m
    D0 = [D0,X097_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D0 = D0';
D1=[];
for i =1:m
    D1 = [D1,X105_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D1 = D1';

D2=[];
for i =1:m
    D2 = [D2,X118_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D2 = D2';
D3=[];
for i =1:m
    D3 = [D3,X130_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D3 = D3';
D4=[];
for i =1:m
    D4 = [D4,X169_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D4 = D4';
D5=[];
for i =1:m
    D5 = [D5,X185_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D5 = D5';
D6=[];
for i =1:m
    D6 = [D6,X197_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D6 = D6';
D7=[];
for i =1:m
    D7 = [D7,X209_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D7 = D7';
D8=[];
for i =1:m
    D8 = [D8,X222_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D8 = D8';
D9=[];
for i =1:m
    D9 = [D9,X234_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D9 = D9';
data = [D0;D1;D2;D3;D4;D5;D6;D7;D8;D9];

Then calculate the mean value, variance, peak value, kurtosis, effective value, crest factor, pulse factor, shape factor, and margin factor of each piece of data in the data variable, and store them in the new_data variable.

for i = 1:size(data,1)
    xdata = data(i,:);
    junzhi=mean(xdata);  %均值
    fangcha=mean((xdata-junzhi).^2);   %方差
    p=max(xdata)-min(xdata);  %峰值
    k=kurtosis(xdata);  %峭度
    r=rms(xdata); %有效值
    c=p/r;    %峰值因子
    v=p/mean(abs(xdata)); %脉冲因子
    s=r/mean(abs(xdata));  %波形因子
    ma=p/mean(sqrt(abs(xdata)))^2;  %裕度因子
    new_data(i,:) = [junzhi,fangcha,p,k,r,c,v,s,ma];
end

  The next step is to organize the data before it is sent to the neural network, specifically to label and normalize each type of data.

%% 导入数据
bv = 120;    %每种状态数据有60组
% 加标签值
hhh = size(new_data,2); 
for i=1:size(new_data,1)/bv
    new_data(1+bv*(i-1):bv*i,hhh+1)=i;
end
% 输入数据
input=new_data(:,1:end-1);    %第1列至倒数第2列为输入
output=data(:,end);       %最后1列为输出
[inputn,inputps]=mapminmax(input',0,1);
[outputn,outputps]=mapminmax(output');

 The specific code to obtain the MIV value:

p=inputn;
t=outputn;
p=p';
[m,n]=size(p);
yy_temp=p;
% p_increase为增加10%的矩阵 p_decrease为减少10%的矩阵
for i=1:n
    p=yy_temp;
    pX=p(:,i);
    pa=pX*1.1;
    p(:,i)=pa;
    aa=['p_increase'  int2str(i) '=p;'];
    eval(aa);
end

for i=1:n
    p=yy_temp;
    pX=p(:,i);
    pa=pX*0.9;
    p(:,i)=pa;
    aa=['p_decrease' int2str(i) '=p;'];
    eval(aa);
end
%% 特征重要度神经网络
nntwarn off;
p=yy_temp;
p=p';
% bp网络建立
net=newff(minmax(p),[8,1],{'tansig','purelin'},'traingdm');
% 初始化网络
net=init(net);
% 网络训练参数设置
net.trainParam.show=50;
net.trainParam.lr=0.05;
net.trainParam.mc=0.7;
net.trainParam.epochs=2000;
net.trainParam.showWindow = false;
net.trainParam.showCommandLine = false;
% 网络训练
net=train(net,p,t);
%% 变量重要度计算
% 转置后sim
for i=1:n
    eval(['p_increase',num2str(i),'=transpose(p_increase',num2str(i),');'])
end
for i=1:n
    eval(['p_decrease',num2str(i),'=transpose(p_decrease',num2str(i),');'])
end
% result_in为增加10%后的输出 result_de为减少10%后的输出
for i=1:n
    eval(['result_in',num2str(i),'=sim(net,','p_increase',num2str(i),');'])
end
for i=1:n
    eval(['result_de',num2str(i),'=sim(net,','p_decrease',num2str(i),');'])
end
for i=1:n
    eval(['result_in',num2str(i),'=transpose(result_in',num2str(i),');'])
end
for i=1:n
    eval(['result_de',num2str(i),'=transpose(result_de',num2str(i),');'])
end
%  MIV的值
%  MIV被认为是在神经网络中评价变量相关的最好指标之一
%  其符号代表相关的方向,绝对值大小代表影响的相对重要性。
for i=1:n
    IV= ['result_in',num2str(i), '-result_de',num2str(i)];
    eval(['MIV_',num2str(i) ,'=mean(',IV,')*(1e7)',';']) ;
    eval(['MIVX=', 'MIV_',num2str(i),';']);
    MIV(i,:)=MIVX;
end
[MB,iranked] = sort(MIV,'descend');

  Data visualization analysis, drawing:

%% 数据可视化分析
%-------------------------------------------------------------------------------------
figure()
barh(MIV(iranked),'g');
xlabel('Variable Importance','FontSize',12,'Interpreter','latex');
ylabel('Variable Rank','FontSize',12,'Interpreter','latex');
title('特征变量重要度','fontsize',12,'FontName','华文宋体')
hold on
barh(MIV(iranked(1:5)),'y');
hold on
barh(MIV(iranked(1:3)),'r');
grid on 
xt = get(gca,'XTick');    
xt_spacing=unique(diff(xt));
xt_spacing=xt_spacing(1);    
yt = get(gca,'YTick');    
% 条形标注
for ii=1:length(MIV)
    text(...
        max([0 MIV(iranked(ii))+0.02*max(MIV)]),ii,...
        ['P ' num2str(iranked(ii))],'Interpreter','latex','FontSize',12);
end
set(gca,'FontSize',12)
set(gca,'YTick',yt);
set(gca,'TickDir','out');
set(gca, 'ydir', 'reverse' )
set(gca,'LineWidth',2);
drawno

The next step is to bring the obtained iranked value into the neural network to obtain the code for the diagnosis result.


function [outputt,predict_label]  = BP(iranked)
addpath(genpath(pwd));
%DE是驱动端数据 FE是风扇端数据 BA是加速度数据 选择其中一个就行
load 97.mat  %正常
load 105.mat  %直径0.007英寸,转速为1797时的  内圈故障
load 118.mat   %直径0.007,转速为1797时的  滚动体故障
load 130.mat  %直径0.007,转速为1797时的  外圈故障
load 169.mat   %直径0.014英寸,转速为1797时的  内圈故障
load 185.mat    %直径0.014英寸,转速为1797时的  滚动体故障
load 197.mat    %直径0.014英寸,转速为1797时的  外圈故障
load 209.mat   %直径0.021英寸,转速为1797时的  内圈故障
load 222.mat  %直径0.021英寸,转速为1797时的  滚动体故障
load 234.mat  %直径0.021英寸,转速为1797时的 外圈故障
% 一共是10个状态,每个状态有120组样本,每个样本的数据量大小为:1×2048
w=1000;                  % w是滑动窗口的大小1000
s=2048;                  % 每个故障表示有2048个故障点
m = 120;            %每种故障有120个样本
D0=[];
for i =1:m
    D0 = [D0,X097_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D0 = D0';
D1=[];
for i =1:m
    D1 = [D1,X105_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D1 = D1';

D2=[];
for i =1:m
    D2 = [D2,X118_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D2 = D2';
D3=[];
for i =1:m
    D3 = [D3,X130_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D3 = D3';
D4=[];
for i =1:m
    D4 = [D4,X169_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D4 = D4';
D5=[];
for i =1:m
    D5 = [D5,X185_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D5 = D5';
D6=[];
for i =1:m
    D6 = [D6,X197_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D6 = D6';
D7=[];
for i =1:m
    D7 = [D7,X209_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D7 = D7';
D8=[];
for i =1:m
    D8 = [D8,X222_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D8 = D8';
D9=[];
for i =1:m
    D9 = [D9,X234_DE_time(1+w*(i-1):w*(i-1)+s)];
end
D9 = D9';
data = [D0;D1;D2;D3;D4;D5;D6;D7;D8;D9];
for i = 1:size(data,1)
    xdata = data(i,:);
    junzhi=mean(xdata);
    fangcha=mean((xdata-junzhi).^2);   
    p=max(xdata)-min(xdata); 
    k=kurtosis(xdata);
    r=rms(xdata); 
    c=p/r; 
    v=p/mean(abs(xdata));
    s=r/mean(abs(xdata));
    ma=p/mean(sqrt(abs(xdata)))^2;
    new_data1(i,:) = [junzhi,fangcha,p,k,r,c,v,s,ma];
end

%这里进行一个简单的判断
iranked = iranked';
if sum(length(iranked)==[1,2,3,4,5,6,7,8,9])==9
    new_data = new_data1;
else
    ir = iranked(1:4);  %取重要度较高的前4个特征作为神经网络的输入
    new_data = new_data1(:,ir);
end

rng('default')
%% 导入数据
bv = 120;    %每种状态数据有60组
% 加标签值
hhh = size(new_data,2); 
for i=1:size(new_data,1)/bv
    new_data(1+bv*(i-1):bv*i,hhh+1)=i;
end
new_data=new_data(randperm(size(new_data,1)),:);    %此行代码用于打乱原始样本,使训练集测试集随机被抽取,有助于更新预测结果。
input=new_data(:,1:end-1);
output1 =new_data(:,end);
for i=1:size(new_data,1)
    switch output1(i)
        case 1
            output(i,1)=1;
        case 2
            output(i,2)=1;
        case 3
            output(i,3)=1;
         case 4
          output(i,4)=1;
         case 5
          output(i,5)=1;
            case 6      
          output(i,6)=1;
            case 7
          output(i,7)=1;
            case 8     
          output(i,8)=1;
            case 9    
          output(i,9)=1;
            case 10   
          output(i,10)=1;
     end
end
m=fix(size(new_data,1)*0.7);    %训练的样本数目
input_train=input(1:m,:)';
output_train=output(1:m,:)';
input_test=input(m+1:end,:)';
output_test=output(m+1:end,:)';
%% 数据归一化
[inputn,inputps]=mapminmax(input_train,0,1);
% [outputn,outputps]=mapminmax(output_train);
inputn_test=mapminmax('apply',input_test,inputps);
 
hiddennum_best = 30;
%% 构建最佳隐含层节点的BP神经网络
disp(' ')
disp('标准的BP神经网络:')
net0=newff(inputn,output_train,hiddennum_best,{'tansig','purelin'},'trainlm');% 建立模型
%网络参数配置
net0.trainParam.epochs=1000;            % 训练次数,这里设置为1000次
net0.trainParam.lr=0.01;                % 学习速率,这里设置为0.01
net0.trainParam.goal=0.000001;           % 训练目标最小误差,这里设置为0.0001
net0.trainParam.show=25;                % 显示频率,这里设置为每训练25次显示一次
% net0.trainParam.mc=0.01;                % 动量因子
net0.trainParam.min_grad=1e-6;          % 最小性能梯度
net0.trainParam.max_fail=6;             % 最高失败次数
 
%开始训练
net0=train(net0,inputn,output_train);
%预测
an0=sim(net0,inputn_test); %用训练好的模型进行仿真
predict_label=zeros(1,size(an0,2));
for i=1:size(an0,2)
    predict_label(i)=find(an0(:,i)==max(an0(:,i)));
end
outputt=zeros(1,size(output_test,2));
for i=1:size(output_test,2)
    outputt(i)=find(output_test(:,i)==max(output_test(:,i)));
end

%% 画方框图
confMat = confusionmat(outputt,predict_label);  %output_test是真实值标签
figure;
set(gcf,'unit','centimeters','position',[15 5 13 9])
plotConfMat(confMat.');  
xlabel('Predicted label')
ylabel('Real label')
hold off
end

Welcome everyone to leave a message in the comment area!

How to get the code, the key words of the card below: MIV

Welcome everyone to leave a message in the comment area, what type of code is needed, please tell the blogger!

Guess you like

Origin blog.csdn.net/woaipythonmeme/article/details/131208086