Let's take a look at how the BP neural network is implemented inside the matlab toolbox

Table of contents

1. Source code reproduction effect

2. The main process of training

3. Analysis of the source of effect difference

4. Comparison of the effects of different training methods

5. Related Articles


Original article, reprint please indicate that it is from " Old Cake Explanation-BP Neural Network " bp.bbbdata.com

If we directly use the gradient descent method to solve the BP neural network,

Often not as good as the matlab toolbox.

This problem has troubled me for a long time,

Then we might as well dig out the source code to see how the matlab toolbox implements the BP neural network.

Why our self-written training effect is not as good as the toolbox BP neural network.

1. Source code reproduction effect


Pull out the source code of the matlab toolbox gradient descent traind algorithm, and sort out the algorithm process,

The weight of a 2 hidden layer BP neural network obtained by self-written code

The weight obtained by calling the toolbox newff:

It can be seen that the two results are the same, indicating that the BP neural network training logic of the toolbox is fully understood and reproduced.


2. The main process of training


The main process of BP neural network gradient descent method is as follows


First initialize the weight threshold,

Then iteratively thresholds the weights with the gradient,

Exit training if termination condition is reached

The termination condition is: the error has reached the requirement, the gradient is too small or the maximum number of times is reached


code show as below:


function [W,B] = traingdBPNet(X,y,hnn,goal,maxStep)
%------变量预计算与参数设置-----------
lr        = 0.01;            % 学习率  
min_grad  = 1.0e-5;          % 最小梯度

%---------初始化WB-------------------
[W,B] = initWB(X,y,hnn);  % 初始化W,B

%---------开始训练--------------------
for t = 1:maxStep
    % 计算当前梯度
    [py,layerVal] = predictBpNet(W,B,X);         % 计算网络的预测值
    [E2,E]        = calMSE(y,py);              % 计算误差
    [gW,gB]       = calGrad(W,layerVal,E);    % 计算梯度
    
    %-------检查是否达到退出条件----------
    gradLen = calGradLen(gW,gB);              % 计算梯度值
    % 如果误差已达要求,或梯度过小,则退出训练
    if E2 < goal   || gradLen <=min_grad
        break;                 
    end
    
    %----更新权重阈值-----
    for i = 1:size(W,2)-1
        W{i,i+1} = W{i,i+1} + lr * gW{i,i+1};%更新梯度
        B{i+1} = B{i+1} + lr * gB{i+1};%更新阈值
    end
end
end

(The code here reproduces the common operations of the two algorithms, which are normalization processing and generalization verification)

3. Analysis of the source of effect difference


Source of Performance Difference


The main process is not different from the regular algorithm tutorial,

So why is the result of matlab better,

The reason is mainly in the initialization,

Many tutorials suggest random initialization,

In fact, the matlab toolbox uses the nguyen_Widrow method for initialization


nguyen_Widrow method


The nguyen_Widrow method initialization idea is as follows:

Taking a single-input network as an example, it will initialize the network to the following form:

Its purpose is to make each hidden node evenly distributed in the range of input data.

The reason is that if every neuron in the BP neural network is fully utilized at the end,

Then it should be more similar to the above distribution (full coverage of the input range, full use of each neuron),

Instead of random initialization and then slowly adjusting, it is better to give such an initialization from the beginning.

The original text of the method is:

Derrick Nguyen 和Bernard Widrow的《Improving the learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights 》


Fourth, the difference in the effect of different training methods


Effect comparison


And the author used traind, trainda, and trainlm to compare the effects.

found the same problem,

If traind cannot be trained, trainda can be trained.

And if trainda can't train it, trainlm can train it again.

that is, in terms of training effect

traind < trainda < trainlm


Then, if we directly use the self-written gradient descent algorithm,

It is still not as good as we use the matlab toolbox.

The BP neural network of matlab uses the trainlm algorithm by default


Reason brief


So why is trainda better than traind, and why is trainlm better than trainda?

After picking up the source code analysis, it is mainly because the adaptive learning rate has been added to traingda.

The trainlm uses the information of the second derivative to make the learning speed faster.


5. Related Articles


See the complete code to reproduce:

"Rewriting traind code (gradient descent method)"

See the initialization method:

"BP Neural Network Initialization"


This is the algorithm logic of the gradient descent method in the matlab neural network toolbox, so simple~!

Guess you like

Origin blog.csdn.net/dbat2015/article/details/125638331
Recommended