Design and Optimization of MATLAB BP Neural Network


foreword

  • BP (Back Propagation) neural network, a feed-forward neural network that uses error backpropagation to update weight thresholds, is often used to solve fitting or classification problems.

A lot of excellent bloggers have done related work on the detailed description and derivation of neural network and BP neural network calculation principles:

The main purpose of this article is to make some summary of my experience in applying BP neural network in the process of learning and doing projects. The content mainly includes the introduction of MATLAB-based BP neural network design steps and related optimization. If you have roughly understood the principles of computing and are exploring the content of large and small papers for you, perhaps reading this article can speed up your process.
insert image description here

1. Data set division

In order to avoid the overfitting of the model during the training process, we can divide the data set in different ways according to the size of our own data.

  • When the data set is large enough, the division method of training set, test set and verification set can be adopted, and the division ratios are: 0.7, 0.15, 0.15 The default division method of the toolbox.

  • The division method of training set and test set is adopted, and the general ratio is: 0.8, 0.2.

  • When the data set is small, the leave-one-out method is used for circular cross-validation.

  • When the data set is small, some methods can also be used to realize the increment of the data set, such as the data after adding white noise and the image after rotation, reduction, and enlargement.

2. Determination of network topology

2.1 Input layer and output layer

Determine the input and output of the model, such as studying the influence of x, y, z on t. There are 3 inputs and 1 output. What I want to express here is the process of determining the research factors xyz. Many factors may be all recordable data in the simulation or experiment process. If there are many factors, you can consider performing dimensionality reduction processing on the data dimension before training. For example:

  • There are two major methods in machine learning, linear mapping and nonlinear mapping methods.
  • Linear mapping: PCA, LAD
  • Nonlinear mapping: kernel methods (kernel + linear), 2D and tensorization (2D + linear), manifold learning (ISOMap, LLE, LPP)

2.2 Number of hidden layers and number of units

(1) Determination of the number of hidden layers

In many places we can see this sentence:

  • The three-layer BP neural network can approximate an arbitrary given continuous function with arbitrary precision

Of course, so many years of experience in exam-oriented education told me to believe it. So generally I use 3 layers for simulation experiments first, and if it doesn't work, I choose to change the number of layers at last.

(2) Determination of the number of hidden layer units

There are two methods:

  1. Given the approximate range, the number of recurrent units, and the training effect as the judgment condition
  2. Refer to the empirical formula in the existing literature to determine the number of units, for example:
    n=log2 (number of input units);
    n=sqrt (number of input units + number of output neurons)+a (a is between [0,10] constant).
    Here the second empirical formula can be used in combination with method 1

Here, I want to talk about two places. One is that in theory, the number of hidden layers and the number of units can be selected simply and roughly using a loop. Of course, the premise is that the division of the training set is consistent and you have enough time. Of course, it involves the impact of the initial threshold and weight on the network. This process becomes relatively more complicated. This is also the second point I want to say. Can you consider a simplified quantitative analysis of the impact of the initial threshold and weight on specific network training? In this way, it may be possible to use an optimization algorithm to optimize the entire topology. . Of course, this is just my own imagination, and the feasibility and complexity of its realization are really hard to say.

2.3 Transfer function, learning function and performance function

function function name use
Transfer Function logsig Sigmoid function
Transfer Function dlogsig Derivative of the sigmoid function
Transfer Function tansy Sigmoid function
Transfer Function dtansic Derivative of sigmoid function
Transfer Function purelin pure linear function
Transfer Function dpurelin derivative of a purely linear function
learning function learngd Learning function based on gradient descent method
learning function learngdm Gradient Descent Momentum Learning Function
performance function mes mean square error function
performance function mesreg mean squared error normalization function

3. Determination of initial weight and threshold

3.1 Given randomly [-1 1]

In the matlab toolbox, the default initial value is given as [-1 1], and the initial weight threshold is randomly given.

3.2 Optimization Algorithm Optimizing Initial Weights and Thresholds

In common papers now, after the topology of the neural network pair is determined, the initial weights and thresholds are optimized before training. Let the starting point of training be relatively good, which is commonly used here:

  • GA-BP
  • PSO-BP
  • GSO-BP
  • ACO-BP
  • BFO-BP
  • ABC-BP

At the same time, it is also a common place for papers to do algorithms. Of course, the optimization algorithm here can use various swarm intelligence algorithms, and even improve the optimization algorithm to optimize the initial value of the BP neural network. This method is very good and can be used to quickly build a thesis. The key is how to reproduce the network according to the threshold and weight, so as to realize the correct fitness function of the optimization algorithm.

4. Training parameter setting

The parameter setting of the training model affects the training effect, and related adjustments should be made for different models.

project function
The maximum number of iterations net.trainParam.epochs
training target net.trainParam.goal
training time net.trainParam.time
Minimum Gradient Performance net.trainParam.min_grad
Maximum confirmation failures net.trainParam.max_fail
The linear search path used net.trainParam.searchFcn
learning rate net.trainParam.lr
momentum factor net.trainParam.mc

5. Training and training effect evaluation

After using the train function to train the network, you can view the training process and training results in the interface that comes with the toolbox.
insert image description here
insert image description here
Regarding the Regression graph here, I saw that people on Zhihu and Baidu asked that the horizontal and vertical coordinates are actually the real and predicted values ​​after normalization, which represent the effect of regression.

Of course, in order to make it easier to judge the prediction effect of the model training, you can use R2 (coefficient of determination, which is translated as the coefficient of determination in some textbooks, also known as goodness of fit.) to evaluate the relationship between the predicted data and the actual data Relevance. It is convenient to be used as the loop termination condition of the program.

Here I have previously recorded the R2 calculation formula I used and a blog I saw on csdn, which can be used as a reference:

%y1为预测值 y为实际值
R2=1 - (sum((y1- y).^2) / sum((y - mean(y)).^2))
function [r2 rmse] = rsquare(y,f,varargin)
% Compute coefficient of determination of data fit model and RMSE
%
% [r2 rmse] = rsquare(y,f)
% [r2 rmse] = rsquare(y,f,c)
%
% RSQUARE computes the coefficient of determination (R-square) value from
% actual data Y and model data F. The code uses a general version of 
% R-square, based on comparing the variability of the estimation errors 
% with the variability of the original values. RSQUARE also outputs the
% root mean squared error (RMSE) for the user's convenience.
%
% Note: RSQUARE ignores comparisons involving NaN values.
% 
% INPUTS
%   Y       : Actual data
%   F       : Model fit
%
% OPTION
%   C       : Constant term in model
%             R-square may be a questionable measure of fit when no
%             constant term is included in the model.
%   [DEFAULT] TRUE : Use traditional R-square computation
%            FALSE : Uses alternate R-square computation for model
%                    without constant term [R2 = 1 - NORM(Y-F)/NORM(Y)]
%
% OUTPUT 
%   R2      : Coefficient of determination
%   RMSE    : Root mean squared error
%
% EXAMPLE
%   x = 0:0.1:10;
%   y = 2.*x + 1 + randn(size(x));
%   p = polyfit(x,y,1);
%   f = polyval(p,x);
%   [r2 rmse] = rsquare(y,f);
%   figure; plot(x,y,'b-');
%   hold on; plot(x,f,'r-');
%   title(strcat(['R2 = ' num2str(r2) '; RMSE = ' num2str(rmse)]))
%   
% Jered R Wells
% 11/17/11
% jered [dot] wells [at] duke [dot] edu
%
% v1.2 (02/14/2012)
%
% Thanks to John D'Errico for useful comments and insight which has helped
% to improve this code. His code POLYFITN was consulted in the inclusion of
% the C-option (REF. File ID: #34765).

if isempty(varargin); c = true; 
elseif length(varargin)>1; error 'Too many input arguments';
elseif ~islogical(varargin{
    
    1}); error 'C must be logical (TRUE||FALSE)'
else c = varargin{
    
    1}; 
end

% Compare inputs
if ~all(size(y)==size(f)); error 'Y and F must be the same size'; end

% Check for NaN
tmp = ~or(isnan(y),isnan(f));
y = y(tmp);
f = f(tmp);

if c; r2 = max(0,1 - sum((y(:)-f(:)).^2)/sum((y(:)-mean(y(:))).^2));
else r2 = 1 - sum((y(:)-f(:)).^2)/sum((y(:)).^2);
    if r2<0
    % http://web.maths.unsw.edu.au/~adelle/Garvan/Assays/GoodnessOfFit.html
        warning('Consider adding a constant term to your model') %#ok<WNTAG>
        r2 = 0;
    end
end

rmse = sqrt(mean((y(:) - f(:)).^2));

6. Training results

Through the training of the training set model and the testing of the test set model, we believe that the current model is available and can complete the numerical prediction and classification prediction tasks.
Of course, if you need to have a model to reproduce this process, remember to save the corresponding weights, thresholds, normalization parameters, denormalization parameters, record the activation function you use in the network structure, use the activation function and Matrix operations reproduce the network. Experience shows that normalization needs to be normalized to the [-1,1] range, which can perfectly reproduce the network.

ps_input %归一化参数
ps_output%返归一化参数
W1=net.IW{
    
    1,1};
W2=net.LW{
    
    2,1};
B1=net.b{
    
    1} ;
B2=net.b{
    
    2} ;

For specific operations, please refer to the following:


7. Comparison between traditional BP training and adaptive learning rate + momentum learning training

Take the fitting function f(x)=cos(x) as an example (only the comparison of training methods is considered here, and the data set is not divided):

  • Respectively with traditional BP training: traind, adaptive learning rate + momentum learning training: traindx, Levenberg-Marquadt: tranlm

Traditional BP training: traind (long time, slow convergence)
computer
$1600
adaptive learning rate + momentum learning training: traindx (shorter time, faster convergence)
insert image description here

insert image description here
Levenberg-Marquadt: tranlm (time period, fast convergence)
insert image description here
insert image description here
source code:

clc;
clear;
x=0:0.1:2*pi;
y=cos(x);

%% I.数据读取
p_train=x;
t_train=y;
%% II.数据归一化
[P_train, ps_input] = mapminmax(p_train,-1,1);
[T_train, ps_output] = mapminmax(t_train,-1,1);
p = P_train;
t = T_train;
%% III.BP神经网络建立
% 1.创建网络
net = newff(p,t,4,{
    
    'tansig','purelin'},'trainlm');
% net = newff(p,t,4,{
    
    'tansig','purelin'},'traingd');
% net = newff(p,t,4,{
    
    'tansig','purelin'},'traingdx');
% 2.设置训练参数
net.trainParam.epochs = 2000; %运行次数
net.trainParam.goal = 1e-3;   %目标误差
net.trainParam.lr = 0.035;    %学习率
net.trainParam.mc = 0.85;      %动量因子
net.divideFcn = ''; %清除样本数据分为训练集、验证集和测试集命令
%%
% 3. 训练网络
net = train(net,p,t);
%%
% 4. 仿真测试
t_sim =sim(net,p);
%%
% 5. 数据反归一化
T_sim = mapminmax('reverse',t_sim,ps_output);
%% IV.拟合评价
r2_bp = 1 - (sum((T_sim'-  t_train').^2) ./ sum(( t_train' - mean(t_train')).^2))

%% V.绘值训练对比图
figure;
subplot(1,2,1)
plot(x,y);
xlabel('X');
ylabel('Y');
title('y=cos(x)');
subplot(1,2,2)
plot(1:length(T_sim),T_sim,'r-o',1:length(T_sim),t_train,'k-');  
legend('BP神经网络预测值','实际值');
xlabel('X');
ylabel('Y');
string = {
    
    'BP神经网络R1训练集预测结果对比';['R^2=' num2str(r2_bp)]};
title(string);


Summarize

The main purpose of this article is to make some summary of my experience in applying BP neural network in the process of learning and doing projects. The content mainly includes the introduction of MATLAB-based BP neural network design steps and related optimization, as well as your own miscellaneous ideas in the process. If you have any ideas or questions, welcome to private chat or comment! ! !


At the same time, if you have relevant needs in model building and model optimization, you can search for users on Xianyu: Man Xiaojie, welcome to communicate.


Guess you like

Origin blog.csdn.net/ONERYJHHH/article/details/118607243