Huawei Cup Mathematical Modeling Competition Experience Sharing Issue 2 - Programming Hands

Programmers play a more important role in mathematical modeling competitions. They not only need to complete the writing of code according to the ideas of the modeler, but also need to communicate with the writer to analyze and present the results. Therefore, programmers must complete corresponding learning at different stages. Here I divide it into pre-game and during-game.

1. Before the game

It is difficult for most people to learn new codes and apply them in just 4 days, so programmers need to practice more codes before the game, including: prediction type, evaluation type, optimization type, and mechanism model. Class etc. Only by accumulating more in normal times can you complete programming tasks faster during competitions. This is the basic requirement for programmers to prepare before the game.

For prediction problems, it is not only the use of prediction models (BP, SVM, LSTM, etc.), but also the preprocessing of the original data. The data given in mathematical modeling competitions often have outliers, missing values, etc. , so modelers should study or understand some code implementations of outlier removal methods and codes for filling missing values ​​before the game. I will not go into details about the corresponding methods here. You can go to Baidu to search for data cleaning methods. For the optimization category, it is necessary to prepare some commonly used optimization solvers CPLEX, gurobi, etc. Here, you need to be familiar with their programming rules. For the mechanism category, previous competition questions included infectious disease models. We can refer to the ideas of this type of model to conduct mathematical modeling of similar problems. To sum up, for modelers, if they want to win prizes in mathematical modeling competitions, they must continue to accumulate.

2. In competition

During the competition, the programmer needs to write the code according to the modeler's ideas. Sometimes due to time issues or limited personal programming ability, it is impossible to complete the writing of the model created by the modeler. At this time, you need to communicate with the modeler to simplify the model and then complete the programming. Secondly, the programmer also needs to hand over the final data of the code and the preliminary experimental result diagram to the writer for further processing. If there is enough time in the competition, I suggest that programmers can use more methods. For example, in the prediction model, the model we finally choose may be LSTM, then we can use some conventional models such as BP and SVM as Comparatively, this is also a plus. It is worth noting that if method comparison is used, the results of various methods must be analyzed in the paper, that is, tell the review experts why the method I used is good? The answer to this kind of question can often be described in the principle introduction of the model, and finally reflected in the result analysis. Similarly, the comparison of your methods should be reflected in the technical flow chart after the problem analysis. This will make your entire idea clearer and the experimental results more credible.

Here is an example: take the classification of ECG signals as an example. Use 1DCNN and support vector machine to compare the effects. The corresponding part of the code is as follows:

%%1DCNNclear;clc;%% 载入数据;fprintf('Loading data...\n');tic;load('N_dat.mat');load('L_dat.mat');load('R_dat.mat');load('V_dat.mat');fprintf('Finished!\n');toc;fprintf('=============================================================\n');%% 控制使用数据量,每一类5000,并生成标签,one-hot编码;fprintf('Data preprocessing...\n');tic;Nb=Nb(1:5000,:);Label1=repmat([1;0;0;0],1,5000);Vb=Vb(1:5000,:);Label2=repmat([0;1;0;0],1,5000);Rb=Rb(1:5000,:);Label3=repmat([0;0;1;0],1,5000);Lb=Lb(1:5000,:);Label4=repmat([0;0;0;1],1,5000);Data=[Nb;Vb;Rb;Lb];Label=[Label1,Label2,Label3,Label4];clear Nb;clear Label1;clear Rb;clear Label2;clear Lb;clear Label3;clear Vb;clear Label4;Data=Data-repmat(mean(Data,2),1,250); %使信号的均值为0,去掉基线的影响;fprintf('Finished!\n');toc;fprintf('=============================================================\n');%% 数据划分与模型训练测试;fprintf('Model training and testing...\n');Nums=randperm(20000);      %随机打乱样本顺序,达到随机选择训练测试样本的目的;train_x=Data(Nums(1:10000),:);test_x=Data(Nums(10001:end),:);train_y=Label(:,Nums(1:10000));test_y=Label(:,Nums(10001:end));train_x=train_x';test_x=test_x';cnn.layers = {
   
       struct('type', 'i') %input layer    struct('type', 'c', 'outputmaps', 4, 'kernelsize', 31,'actv','relu') %convolution layer    struct('type', 's', 'scale', 5,'pool','mean') %sub sampling layer    struct('type', 'c', 'outputmaps', 8, 'kernelsize', 6,'actv','relu') %convolution layer    struct('type', 's', 'scale', 3,'pool','mean') %subsampling layer};cnn.output = 'softmax';  %确定cnn结构;                         %确定超参数;opts.alpha = 0.01;       %学习率;opts.batchsize = 16;     %batch块大小;opts.numepochs = 30;     %迭代epoch;cnn = cnnsetup1d(cnn, train_x, train_y);      %建立1D CNN;cnn = cnntrain1d(cnn, train_x, train_y,opts); %训练1D CNN;[er,bad,out] = cnntest1d(cnn, test_x, test_y);%测试1D CNN;[~,ptest]=max(out,[],1);[~,test_yt]=max(test_y,[],1);Correct_Predict=zeros(1,4);                     %统计各类准确率;Class_Num=zeros(1,4);                           %并得到混淆矩阵;Conf_Mat=zeros(4);for i=1:10000    Class_Num(test_yt(i))=Class_Num(test_yt(i))+1;    Conf_Mat(test_yt(i),ptest(i))=Conf_Mat(test_yt(i),ptest(i))+1;    if ptest(i)==test_yt(i)        Correct_Predict(test_yt(i))= Correct_Predict(test_yt(i))+1;    endendACCs=Correct_Predict./Class_Num;fprintf('Accuracy = %.2f%%\n',(1-er)*100);fprintf('Accuracy_N = %.2f%%\n',ACCs(1)*100);fprintf('Accuracy_V = %.2f%%\n',ACCs(2)*100);fprintf('Accuracy_R = %.2f%%\n',ACCs(3)*100);fprintf('Accuracy_L = %.2f%%\n',ACCs(4)*100);figure(1)confusionchart(test_y,ptest)
%%支持向量机clear;clc;%% 载入数据;fprintf('Loading data...\n');tic;load('N_dat.mat');load('L_dat.mat');load('R_dat.mat');load('V_dat.mat');fprintf('Finished!\n');toc;fprintf('=============================================================\n');%% 控制使用数据量,每一类5000,并生成标签;fprintf('Data preprocessing...\n');tic;Nb=Nb(1:5000,:);Label1=ones(1,5000);%Label1=repmat([1;0;0;0],1,5000);Vb=Vb(1:5000,:);Label2=ones(1,5000)*2;%Label2=repmat([0;1;0;0],1,5000);Rb=Rb(1:5000,:);Label3=ones(1,5000)*3;%Label3=repmat([0;0;1;0],1,5000);Lb=Lb(1:5000,:);Label4=ones(1,5000)*4;%Label4=repmat([0;0;0;1],1,5000);Data=[Nb;Vb;Rb;Lb];Label=[Label1,Label2,Label3,Label4];Label=Label';clear Nb;clear Label1;clear Rb;clear Label2;clear Lb;clear Label3;clear Vb;clear Label4;Data=Data-repmat(mean(Data,2),1,250); %使信号的均值为0,去掉基线的影响;fprintf('Finished!\n');toc;fprintf('=============================================================\n');%% 利用小波变换提取系数特征,并切分训练和测试集;fprintf('Feature extracting and normalizing...\n');tic;Feature=[];for i=1:size(Data,1)    [C,L]=wavedec(Data(i,:),5,'db6');  %% db6小波5级分解;endNums=randperm(20000);      %随机打乱样本顺序,达到随机选择训练测试样本的目的;train_x=Feature(Nums(1:10000),:);test_x=Feature(Nums(10001:end),:);train_y=Label(Nums(1:10000));test_y=Label(Nums(10001:end));[train_x,ps]=mapminmax(train_x',0,1); %利用mapminmax内建函数特征归一化到0,1之间;test_x=mapminmax('apply',test_x',ps);train_x=train_x';test_x=test_x';fprintf('Finished!\n');toc;fprintf('=============================================================\n');%% 训练SVM,并测试效果;fprintf('SVM training and testing...\n');tic;model=libsvmtrain(train_y,train_x,'-c 2 -g 1'); %模型训练;[ptest,~,~]=libsvmpredict(test_y,test_x,model); %模型预测;Correct_Predict=zeros(1,4);                     %统计各类准确率;Class_Num=zeros(1,4);Conf_Mat=zeros(4);for i=1:10000    Class_Num(test_y(i))=Class_Num(test_y(i))+1;    Conf_Mat(test_y(i),ptest(i))=Conf_Mat(test_y(i),ptest(i))+1;    if ptest(i)==test_y(i)        Correct_Predict(test_y(i))= Correct_Predict(test_y(i))+1;    endendACCs=Correct_Predict./Class_Num;fprintf('Accuracy_N = %.2f%%\n',ACCs(1)*100);fprintf('Accuracy_V = %.2f%%\n',ACCs(2)*100);fprintf('Accuracy_R = %.2f%%\n',ACCs(3)*100);fprintf('Accuracy_L = %.2f%%\n',ACCs(4)*100);toc;figure(1)confusionchart(test_y,ptest)

Comparative Results:

1DCNN results:

picture

picture

SVM results:

picture

picture

    Comparing the results of the two models, it can be seen that 1DCNN is significantly better than SVM. Why 1DCNN is obviously better than SVM? We can analyze it from the perspective of algorithm principles. We will not go into details here.

    

Guess you like

Origin blog.csdn.net/qq_45013535/article/details/133146689