Matlab neural network learning summary

Through neural network filtering and signal processing, the traditional sigmoid function has a global approximation ability, while the radial basis rbf function has a better local approximation ability, and the completely orthogonal rbf radial basis function is used as the excitation function, which has a larger The superiority, this is the wavelet neural network, which has a stronger ability to approximate details.

The characteristics of BP
network ① The network essentially realizes a mapping function from input to output, and mathematical theory has proved that it has the function of realizing any complex nonlinear mapping. This makes it particularly suitable for solving problems with complex internal mechanisms. We don't need to build a model, or understand its internal processes, just input and get output. As long as the BPNN structure is excellent, generally problems with less than 20 input functions can converge to the lowest error within 50,000 learning times. And in theory, a three-layer neural network can approximate a given function with arbitrary precision, which is a very tempting expectation;
②The network can automatically extract "reasonable" solving rules by learning the example set with the correct answer, namely Have self-learning ability;
③Network has a certain promotion and generalization ability.
The main applications of bp are
regression prediction (fitting, data processing and analysis, thing prediction, control, etc.), classification recognition (type division, pattern recognition, etc.), and in the following studies, example programs will be given.
But no matter what kind of network, what method, the accuracy of problem solving cannot reach 100%, but it does not affect its use, because in reality many complex problems, precise explanation is meaningless, meaningful analysis There is bound to be a loss of precision.
BP attention problem
1. The learning speed of the BP algorithm is very slow. The main reasons are:
a . Because the BP algorithm is essentially a gradient descent method, and the objective function to be optimized is very complex, therefore, there will inevitably be a "sawtooth phenomenon" , which makes the BP algorithm inefficient ;

Conclusion 4: It can be seen from the above table that the initial weight of the latter is more appropriate, so the training time is shortened,

The error convergence speed is significantly faster. Therefore, the selection of initial weights is very important for the training of a network.

1.4 , when using the most basic BP algorithm to train the BP neural network, the learning rate, mean square

The settings of error, weight, and threshold all have an impact on the training of the network. Comprehensive selection of reasonable values, there will be

Conducive to the training of the network. In the most basic BP algorithm, the learning rate remains constant throughout the training process,

If the learning rate is too large, the algorithm may oscillate and become unstable; if the learning rate is too small, the convergence speed will be slow and the training time will be reduced.

long. It is unrealistic to choose the optimal learning rate before training. Therefore, the BP algorithm with variable learning rate

(variable learning rate backpropagation, VLBP) came into being. Introduction to the next knot

The case of training the network with VLBP.
b There is a paralysis phenomenon, because the optimized objective function is very complex, it will inevitably appear some flat areas when the neuron output is close to 0 or 1, in these areas, the weight error changes very little, so that the training process almost stops ;
c In order to make the network implement the BP algorithm, the traditional one-dimensional search method cannot be used to find the step size of each iteration, but the update rule of the step size must be given to the network in advance, which will cause the algorithm to be inefficient.
2. The possibility of network training failure is relatively high. The reasons are as follows:
a. From a mathematical point of view, the BP algorithm is a local search optimization method, but the problem it needs to solve is to solve the global extremum of complex nonlinear functions, so , the algorithm is likely to fall into a local extremum, and the training fails;
b. The approximation and generalization capabilities of the network are closely related to the typicality of the learning samples, and it is a very difficult problem to select typical sample instances from the problem to form the training set.
3. Selection of network structure:
There is no unified and complete theoretical guidance, and generally it can only be selected by experience. For this reason, some people call the structure selection of neural networks an art. The structure of the network directly affects the approximation ability and generalization properties of the network. Therefore, how to choose a suitable network structure in the application is an important issue.
4. The newly added samples must affect the network that has been successfully learned, and the number of features describing each input sample must also be the same.
5. Using the s-type activation function, since the ideal output value of each neuron in the output layer can only be close to 1 or 0, but cannot reach 1 or 0, when setting the expected output component Tkp of each training sample, it cannot be set to 1 or 0, it is more appropriate to set 0.9 or 0.1.

One know bp:

The content of the second section: mainly to explain several concepts and problems that are easy to confuse in BP, including what is the generalization ability of the network? What is overfitting and how to deal with it? What does the learning rate do? What are the weights and thresholds of neural networks? Using BP to approximate nonlinear functions, how to improve training accuracy? This section mainly studies several concepts and problems that are easy to confuse in BP: What is the generalization ability of the network? What is overfitting and how to deal with it? What does the learning rate do? What are the weights and thresholds of neural networks? Using BP to approximate nonlinear functions, how to improve training accuracy?
What is the generalization ability of the network?
Whether a neural network is good or not is different from fitting evaluations such as traditional least squares (mainly based on residuals, goodness of fit, etc.), not in its ability to fit existing data, but in the subsequent predictive ability, that is, generalization ability.
The contradiction between the predictive ability of the network (also called generalization ability, promotion ability) and training ability (also called approximation ability, learning ability). In general, when the training ability is poor, the predictive ability is also poor, and to a certain extent, with the improvement of the training ability, the predictive ability also improves. But this trend has a limit. When this limit is reached, as the training ability increases, the prediction ability decreases instead, that is, the so-called "over-fitting" phenomenon occurs. At this time, the network has learned too many sample details, and cannot reflect the laws contained in the samples.
What is overfitting and how to deal with it?
Neural network computing cannot blindly pursue the minimum training error, which is prone to "overfitting" phenomenon. As long as the change of error rate can be detected in real time, the optimal number of training times can be determined, such as about 15,000 learning times. If you don't Observe, setting it to 500,000 times of learning, not only will it take a long time to run, but the final result will definitely be disappointing.
One way to avoid overfitting is: in data input, classify the training data into normal training, variable data, and test data. How to perform this classification will be described in the following sections.
Among them, variable data, in network training, plays a role in preventing over-fitting state.
What does the learning rate do?
The learning rate parameter can control the step size of the energy function, and if it is set to automatic adjustment, the learning rate can be slowed down after the error rate drops rapidly, thereby increasing the stability of BPNN.
At this time, the training method uses

Bayesian regularization algorithm is used to improve the generalization ability of BP network.

Two main functions

1  The prepca function performs principal component analysis on the normalized sample data, thereby eliminating redundant components in the sample data and achieving the purpose of data dimensionality reduction.
[ptrans,transMat] = prepca(pn,0.001);

2. Data normalization mapminmax(x, minx, maxx) minx is the minimum value to be obtained, maxx is the maximum value, generally normalized to 0.1-0.9 for comparison and

2 or use mapstd

3 Denormalization y1=mapminmax('reverse',y,ps)
4 dividevec() function input training data out of order, and classification [trainV,valV,testV] = dividevec(p,t,valPercent,testPercent )  

p = rands(3,1000);t = [p(1,:).*p(2,:); p(2,:).*p(3,:)];

[trainV,valV,testV] = dividevec(p,t,0.20,0.20);

 net = newff(minmax(p),[10 size(t,1)]);

[trainV, valV, testV, trainInd, valInd, testInd] = divideblock(allV, trainRatio, valRatio, testRatio)

[training data, variable data, test data, label of training data matrix, variable data label, test data label] = divideblock(all data, percentage of training data, percentage of variable data, percentage of test data)

In fact, the difference between dividevec and the following four classification functions is that dividevec is generally called directly in Matlab code.
The latter four functions are realized by setting the divideFcn function of the network, for example, net.divideFcn='divideblock', but it does not mean that it cannot be called directly in the code like dividevec
 

Normal data is used for normal training
Variable data, the main purpose is to prevent overfitting during training
Test data is used to see the training effect

 net = train(net,trainV.P,trainV.T,[],[],valV,testV);

sim(net,validateSamples.P,[],[],validateSamples.T)

2. Any method that generates the same random number:

Try to generate random numbers related to time, the seed is related to the current time.

rand('state',sum(100*clock))

即: rand('state',sum(100*clock)) ;rand(10)

As long as rand('state', sum(100*clock)) is executed; the current computer time does not appear, and the generated random value does not appear. That is, if the time is the same, the generated random number will still be the same. If your computer is fast enough If it is fast, try running it: rand('state',sum(100*clock));A=rand(5,5);rand('state',sum(100*clock));B=rand( 5,5)

  1.  Commonly used bp training functions

Traindm % Momentum Gradient Descent Algorithm

Trainingda variable learning rate gradient descent algorithm

Traindx % Variable Learning Rate Momentum Gradient Descent Algorithm

Trainrp RPROP (Elastic BP) algorithm, a large-scale network training method with minimal memory requirements

% (conjugate gradient algorithm)

traincgf '; % Fletcher-Reeves correction algorithm
traincgp' ; % Polak-Ribiere correction algorithm, the memory requirement is slightly larger than the Fletcher-Reeves correction algorithm
traincgb '; % Powell-Beal reset algorithm, the memory requirement is slightly larger than the Polak-Ribiere correction algorithm
% ( algorithm of choice for large networks)

trainscg' ; % Scaled Conjugate Gradient algorithm, the memory requirement is the same as the Fletcher-Reeves correction algorithm, and the calculation amount is much smaller than the above three algorithms trainbfg ';
% Quasi-Newton Algorithms - BFGS Algorithm, the calculation amount and memory requirement are both higher than the conjugate The gradient algorithm is large, but the convergence is faster
trainoss '; % One Step Secant Algorithm, the calculation amount and memory requirements are smaller than the BFGS algorithm, slightly larger than the conjugate gradient algorithm

% ( preferred algorithm for medium-sized networks)
' trainlm '; % Levenberg-Marquardt algorithm, the largest memory requirement, the fastest convergence
'trainbr' ; % Bayesian regularization algorithm
6 trainb is used for network initial threshold and weight training functions
[net,tr] = trainb(net,tr,trainV,valV,testV) ; not directly used by train calls.
The output method of the trained weight and threshold is:
input to the hidden layer weight: code: w1=net.iw{1,1}

Hidden layer threshold: Code: theta1=net.b{1}

Weight code : w2=net.lw{2,1};

Output layer threshold: Code: theta2=net.b{2}

net.IW is the input weight, net.LW is the layer weight, and net.b is the threshold. Just use the command to assign these values ​​directly. If you don't understand, you can refer to the help

The trained BP neural network saves:

Because each time the network is initialized randomly, and the error at the end of the training is not exactly the same, as a result, the weights and valves after training are not exactly the same (roughly the same), so the results after each training are also slightly different. There are different.
After finding a better result, use the command save filen_ame net_name to save the network, so that the predicted result will not change, and use the command load filename to load when needed.
About how to find a better result and save it, you can set the error and save it in a loop. For specific usage, please refer to the optimized example of bp traffic prediction

7 Maximum learning rate

lr=0.99*maxlinlr(p,1);A=purelin(W*P,B);e=T-A;[dW,dB]=learnwh(P,e,lr);B=B+dB。

w is the weight and b is the threshold. e is the error

Neural network example.

Example 1

In this paper, we practice BP function recognition with training, verification, and test data.

x=1:1:100;y=2*sin(x*pi/10)+0.5*randn(1,100);plot(x,y); input and output

[xg,ps]=mapminmax(x,.1,.9);[yg,pt]=mapminmax(y,.1,.9); normalized to 0.1-0.9

[trainV,valV,testV] = dividevec(xg,yg,0.2,0.2); Randomly draw training, validation, test data

net=newff(minmax(xg),[15,1],{'tansig','purelin'},'trainscg'); build BP function

net=train(net,trainV.P,trainV.T,[],[],valV,testV); train

xtrain=sim(net,trainV.P,[],[],trainV.T);simulation

xvalidation= sim(net,valV.P,[],[],valV.T); simulation

xtest= sim(net,testV.P,[],[],testV.T); simulation

ytrainr=mapminmax('reverse',xtrain,pt); denormalize the simulation

yvalidationr=mapminmax('reverse',xvalidation,pt); Denormalize the simulation

ytestr= mapminmax('reverse',xtest,pt); denormalize the simulation

trainr = mapminmax('reverse', trainV.T, pt); denormalization

validationr=mapminmax('reverse',valV.T,pt); denormalization

testr=mapminmax('reverse',testV.T,pt); denormalization

msetrain=mse(trainr-ytrainr); seek error

msevalidation=mse(yvalidationr-validationr); seek error

msetest=mse(testr-ytestr); Find the error
Example 2 uses the Bayesian regularization algorithm to improve the generalization ability of the BP network.

  1. In this example, we use two training methods, the LM optimization algorithm (trainlm) and the Bayesian regularization algorithm (trainbr), to train the BP network so that it can fit a sinusoidal sample with white noise added data. Among them, the sample data can be generated using the following MATLAB statements:
    input vector: P = [-1:0.05:1];
    target vector: randn('seed',78341223);
    T = sin(2*pi*P)+0.1* randn(size(P));
    Solution: The MATLAB program in this example is as follows:

    close all
    clear
    echo on
    clc
    % NEWFF——to generate a new forward neural network
    % TRAIN——to train the BP neural network
    % SIM——to BP neural network simulation
    pause        
    % Press any key to start
    clc
    % Define the training sample vector
    % P is the input vector
    P = [-1:0.05:1];
    % T is the target vector
    randn('seed',78341223); T = sin (2*pi*P)+0.1*randn(size(P));
    % Draw sample data points
    plot(P,T,'+');
    echo off
    hold on;
    plot(P,sin(2*pi*P ),':');        
    % Draw a sine curve without noise
    echo on
    clc
    pause
    clc
    % Create a new feedforward neural network
    net=newff(minmax(P),[20,1],{'tansig','purelin'});
    pause
    clc
    echo off
    clc
    disp('1. LM optimization algorithm TRAINLM'); disp('2. Bayesian regularization algorithm TRAINBR'); choice
    =input('Please select the training algorithm (1,2):');
    figure( gcf);
    if(choice==1)                 
         echo on         
         clc         
         % use LM optimization algorithm TRAINLM
         net.trainFcn='trainlm';         
         pause         
         clc         
         % set training parameters         
         net.trainParam.epochs = 500;         
         net.trainParam.goal = 1e-6 ;         
         net=init(net);        
         % Reinitialize           
         pause         
         clc
    elseif(choice==2)         
         echo on         
         clc         
         % Use Bayesian regularization algorithm TRAINBR         
         net.trainFcn='trainbr';         
         pause         
         clc         
         % Set training parameters         
         net.trainParam.epochs = 500;         
         randn('seed',192736547);         
         net = init(net);        
         % Reinitialize           
         pause         
         clc         
    end
    % Call corresponding algorithm to train BP network
    [net,tr]=train(net,P,T);
    pause
    clc
    % Simulate the BP network
    A = sim(net,P);
    % calculate simulation error
    E = T - A;
    MSE=mse(E)
    pause
    clc
    % draw matching result curve
    close all;
    plot(P,A,P,T,'+',P ,sin(2*pi*P),':');
    pause;
    clc         
    echo off
    RBF network and BP network comparison:
    • The output of the RBF network is the linear weighted sum of the hidden unit output, and the learning speed is accelerated
    • The BP network uses the sigmoid() function as the activation function, which makes the neuron have a large input visible area
    • The radial basis neural network uses the radial basis function (generally using the Gaussian function) as the activation function, and the neuron input space area is small, so more radial basis neurons are required

Two RBF Radial Basis Networks

[net,tr] = newrb(P,T,goal,spread,MN,DF)

 Spread is the dispersion coefficient, the default is 1 , and the spread distribution density is self-adjusting . I suggest you try a few more times. The larger the Spread , the smoother the function but the more unstable the approach to the target. MN is the maximum number of neurons

    Here we design a radial basis network given inputs P

     and targets T.

     P = [1 2 3];

     T = [2.0 4.1 5.9];

      net = newrb(P,T);

       P = 1.5;

       Y = sim(net,P)

As we all know, when the BP network is used for function approximation, the weight adjustment adopts the negative gradient descent method. This method of adjusting weights has limitations, namely slow convergence and local minima. The radial basis function network (RBF) is superior to the BP network in terms of approximation ability, classification ability and learning speed.

Matlab provides four functions related to the radial basis function. They all create a two-layer neural network. The first layer is a radial base layer, and the second layer is a linear layer or a competition layer . The main difference is that their weights and thresholds are different functions or whether there is a threshold.
Note: The radial basis function network does not need to be trained, it is automatically trained when it is created.
1.net = newrbe(P,T,spread)
    The newrbe() function can quickly design a radial basis function network, and the design error is 0. The number of neurons in the first layer (radial base layer) is equal to the number of input vectors, the weighted input function is dist, and the network input function is netprod; the number of neurons in the second layer (linear layer) is determined by the output vector T, and the weighted input function For dotprod, the network input function is netsum. Both layers have thresholds.
The initial value of the weight of the first layer is p', and the initial value of the threshold is 0.8326/spread. The purpose is to make the output of the radial base layer be 0.5 when the weighted input is ± spread. The setting of the threshold determines each radial base neuron. The region that responds to the input vector.
2.[net,tr] = newrb(P,T,goal,spread,MN,DF)
This function is the same as newrbe, but it can automatically increase the number of neurons in the hidden layer of the network until the mean square error meets the precision or the number of neurons until the maximum is reached.
  Example 1

P=-1:0.1:1;
T=sin(P);
spread=1;
mse=0.02;
net=newrb(P,T,mse,spread);
t=sim(net,P);
plot(P,T,'r*',P,t)

Example 2

P=-1:0.1:1;
T=[-0.9602 -0.5770 -0.0729 0.3771 0.6405 0.6600 0.4609 0.1336 -0.2013 -0.4344 -0.5000 -0.3930 -0.1647 0.0988 0.3072 0.3960 0.3449 0.1816 -0.0312 -0.2189 -0.3201];

% Using the newb() function can quickly build a radial basis neural network, and the network is automatically adjusted according to the input vector and the expected value %, so as to perform function approximation, and the mean square error precision is preset as eg and the spread constant sc.
eg=0.02;
sc=1;
net=newrb(P,T,eg,sc);

figure;
plot(P,T,'+');
xlabel('
input');
X=-1:0.1:1;
Y=sim(net,X);
hold on;
plot(X,Y);
hold off ;
legend('target','output')

3.net = newgrnn(P,T,spread) generalized regression neural network (generalized regression neural network)

generalized regression network is mainly used for function approximation. Its structure is exactly the same as that of newbre, but there are the following differences (the ones not explained are the same):
(1) The initial value of the weight of the second network is T
(2) The second layer has no threshold
(3) The second The weight input function of the layer is normpod, and the network input function is netsum
> > P=0:1:20;
> > T=exp(P).*sin(P);
> > net=newgrnn(P,T,0.7 );
> > p=0:0.1:20;
> > t=sim(net,p);
> > plot(P,T,'*r',p,t)
4.net = newpnn(P,T, spread) probabilistic neural network ( mainly used for classification problems)
    The biggest difference between this network and the previous three is that the second layer is no longer a linear layer but a competitive layer, and the competitive layer has no threshold value. newbre, so the PNN network is mainly used to solve classification problems . PNN is classified in the following way:
After providing an input vector to the network, first, the radial base layer calculates the distance ||dist|| between the input vector and the sample input vector,The output of this layer is a distance vector; the competition layer accepts the distance vector as input, calculates the probability of each pattern, and outputs 1 for the element with the highest probability through the competition transfer function , otherwise it is 0.

Note: Since the second layer is a competition layer, the input/output vector must be converted using the ind2vec/vec2ind function, that is, converting the index to a vector or converting a vector to an index. After conversion, the number of rows is equal to the number of data indexes, and the number of columns is equal to the largest value in the data index
> > P=[1 2; 2 2;1 1]'
P =
1 2 1
2 2 1
> > Tc=[1 2 3 ]; should be classification index
> > T=ind2vec(Tc)
T =
(1,1) 1
(2,2) 1
(3,3) 1

> > spread=1;
> > net=newpnn(P,T, spread);
> > t=sim(net,P)
t =
(1,1) 1
(2,2) 1
(3,3) 1

> > tc=vec2ind(t)

tc =
1 2 3
% from here It can be seen that gnn has accurately classified P.

Example 4 reads wav files

name=input('file name','s');
fname = ['F:\\
Matlab program for digital speech recognition based on probabilistic neural network \newpnn\wav\name' int2str(i-1) '.wav'] ;

Example 1

% Universal radial basis function network ——
% It
is superior to BP neural network in terms of approximation ability , classification ability , and learning speed % In radial basis network , the spread constant of the radial base layer is the key to the selection of spread % The faster the spread The larger the number of neurons needed , the accuracy will decrease accordingly . The default value of spread is 1 % . The network can be generated by net=newrbe(P,T,spread) , and the error is 0 % , and the network can be generated by net=newrb(P , T, goal, spread) to generate a network , neurons increase from 1 until the training accuracy is reached or the number of neurons is the largest Example 1 %GRNN network ,




Quickly generate generalized regression neural network (GRNN)
P=[4 5 6];
T=[1.5 3.6 6.7];
net=newgrnn(P,T);
%
simulation verification
p=4.5;
v=sim(net,p)

example 2 %PNN network , probabilistic neural network
P=[0 0 ;1 1;0 3;1 4;3 1;4 1;4 3]'; Tc
=[1 1 2 2 3 3 3];
%
will expect the output Convert through ind2vec() , and design and verify the network T=ind2vec(Tc); net=newpnn(P,T); Y=sim(net,P); Yc=vec2ind(Y) % Try to verify with other input vectors Network P2=[1 4;0 1;5 2]'; Y=sim(net,P2); Yc=vec2ind(Y) Example 4 % Draw the curve of the radial basis transfer function of hidden layer neurons p=-3: .1:3; a=radbas(p); plot(p,a)












title('
radial basis transfer function ')
xlabel('
input vector p')
%
The weights and thresholds of hidden layer neurons are related to the position and width of radial basis function , as long as the number of hidden layer neurons, weights and thresholds Correct , it can approximate any function
% such as
a2=radbas(p-1.5);
a3=radbas(p+2);
a4=a+a2*1.5+a3*0.5;
plot(p,a,'b',p, a2,'g',p,a3,'r',p,a4,'m--')
title('
The sum of radial basis transfer function weights ')
xlabel('
input p');
ylabel('
output a');
%
When using the newrb() function to build a radial basis network , you can pre-set the mean square error precision eg and the spread constant sc eg=0.02; sc=1; % The selection of its value has a lot to do with the effect of the final network big relationship ,overfit

, too large to cause overlap
net=newrb(P,T,eg,sc);
%
network test
plot(P,T,'*')
xlabel('
input ');
X=-1:.01:1;
Y =sim(net,X);
hold on
plot(X,Y);
hold off
legend('
target ',' output ')
%
Apply grnn for function approximation
P=[1 2 3 4 5 6 7 8];
T= [0 1 2 3 2 1 2 1];
plot(P,T,'.','markersize',30)
axis([0 9 -1 4])
title('
to be approximated function ')
xlabel('P' )
ylabel('T')
%
Network Design
% For discrete data points ,The spread constant spread is selected to be slightly smaller than the distance between the input vectors
spread=0.7;
net=newgrnn(P,T,spread);
%
network test
A=sim(net,P);
hold on
outputline=plot(P,A,'o','markersize',10,'color',[1 0 0]);
title('
Detection network ')
xlabel('P')
ylabel('T
and A')
%
Apply pnn to classify variables
P=[1 2;2 2;1 1]; % Input vector
Tc= [1 2 3]; The three expected outputs corresponding to %P
% draw the input vector and its corresponding category
plot(P(1,:),P(2,:),'.','markersize',30 )
for i=1:3
text(P(1,i)+0.1,P(2,i),sprintf('class %g',Tc(i))) end
axis
([0 3 0 3]);
title('
Three vectors and their categories ')
xlabel('P(1,:)')
ylabel('P(2,:)')
%
Network design
T=ind2vec(Tc);
spread=1;
net=newgrnn(P,T,speard);
%
network test
A=sim(net,P);
Ac=vec2ind(A);
   %
draw input vector and its corresponding The network output
plot(P(1,:),P(2,:),'.','markersize',30)
for i=1:3
text(P(1,i)+0.1,P(2, i),sprintf('class %g',Ac(i)))
end
axis([0 3 0 3]);
title('
Network test results ')
xlabel('P(1,:)')
ylabel(' P(2,:)')

Matlab source program of 3 radial basis networks


1. Clustering-based RBF network design algorithm
SamNum = 100; % total sample number
TestSamNum = 101; % test sample number
InDim = 1; % sample input dimension
ClusterNum = 10; % hidden node number, that is, cluster sample number
Overlap = 1.0; % Hidden node overlap coefficient
% Obtain sample input and output according to the objective function
rand('state',sum(100*clock))
NoiseVar = 0.1;
Noise = NoiseVar*randn(1,SamNum);
SamIn = 8*rand( 1,SamNum)-4;
SamOutNoNoise = 1.1*(1-SamIn+2*SamIn.^2).*exp(-SamIn.^2/2); SamOut
= SamOutNoNoise + Noise;
TestSamIn = -4:0.08:4 ;
TestSamOut = 1.1*(1-TestSamIn+2*TestSamIn.^2).*exp(-TestSamIn.^2/2);

figure
hold on
grid
plot(SamIn,SamOut,'k+')
plot(TestSamIn,TestSamOut, 'k--')
xlabel('Input x');
ylabel('Output y');

Centers = SamIn(:,1:ClusterNum);

NumberInClusters = zeros(ClusterNum,1); % number of samples in each category, initialized to zero
IndexInClusters = zeros(ClusterNum,SamNum); % each The index number of the samples contained in the class
while 1,
NumberInClusters = zeros(ClusterNum,1); % The number of samples in each class, initialized to zero
IndexInClusters = zeros(ClusterNum,SamNum); % The index number of the samples contained in each class

% Press The minimum distance principle classifies all samples
for i = 1:SamNum
AllDistance = dist(Centers',SamIn(:,i));
[MinDist,Pos] = min(AllDistance);
NumberInClusters(Pos) = NumberInClusters(Pos) + 1;
IndexInClusters(Pos,NumberInClusters(Pos)) = i;
end
% save the old cluster centers
OldCenters = Centers;

for i = 1:ClusterNum
Index = IndexInClusters(i,1:NumberInClusters(i));
Centers(:,i) = mean(SamIn(:,Index)')';
end
% Determine whether the old and new cluster centers are consistent, if so, end the cluster
EqualNum = sum(sum(Centers==OldCenters));
if EqualNum = = InDim*ClusterNum,
break,
end
end

% Calculate the expansion constant (width) of each hidden node
AllDistances = dist(Centers',Centers); % Calculate the distance between data centers of hidden nodes (matrix)
Maximum = max(max(AllDistances) ); % Find the largest distance
for i = 1:ClusterNum % Replace the 0 on the diagonal with a larger value
AllDistances(i,i) = Maximum+1;
end
Spreads = Overlap*min(AllDistances) '; % Take the minimum distance between hidden nodes as the expansion constant

% Calculate the output weight of each hidden node
Distance = dist(Centers',SamIn); % Calculate the distance between each sample input and each data center
SpreadsMat = repmat(Spreads,1 ,SamNum);
HiddenUnitOut = radbas(Distance./SpreadsMat); % Calculate hidden node output matrix
HiddenUnitOutEx = [HiddenUnitOut' ones(SamNum,1)]'; % Consider offset
W2Ex = SamOut*pinv(HiddenUnitOutEx); % Find generalized output weight
W2 = W2Ex(:,1:ClusterNum); % Output weight
B2 = W2Ex(:,ClusterNum+1); % Offset

% Test
TestDistance = dist(Centers',TestSamIn);
TestSpreadsMat = repmat(Spreads,1,TestSamNum);
TestHiddenUnitOut = radbas(TestDistance./TestSpreadsMat);
TestNNOut = W2*TestHiddenUnitOut +B2;
plot(TestSamIn,TestNNOut,'k-')
W2
B2

2. RBF network design algorithm based on gradient method

SamNum = 100; % training sample number
TargetSamNum = 101; % test sample number
InDim = 1; % sample input dimension Number
UnitNum = 10; % Hidden node number
MaxEpoch = 5000; % Maximum training times
E0 = 0.9; % Target error
% Obtain sample input and output according to the objective function
rand('state',sum(100*clock))
NoiseVar = 0.1;
Noise = NoiseVar*randn(1,SamNum);
SamIn = 8*rand(1,SamNum)-4;
SamOutNoNoise = 1.1*(1-SamIn+ 2*SamIn.^2).*exp(-SamIn.^2/2);
SamOut = SamOutNoNoise + Noise;
TargetIn = -4:0.08:4;
TargetOut = 1.1*(1-TargetIn+2*TargetIn.^2 ).*exp(-TargetIn.^2/2);
figure
hold on
grid
plot(SamIn,SamOut,'k+')
plot(TargetIn,TargetOut,'k--')
xlabel('Input x');
ylabel( 'Output y');
Center = 8*rand(InDim,UnitNum)-4;
SP = 0.2*rand(1,UnitNum)+0.1;
W = 0.2*rand(1,UnitNum)-0.1;
lrCent = 0.001; % Hidden node data center learning coefficient
lrSP = 0.001; % Hidden node expansion constant learning coefficient
lrW = 0.001; % Hidden node output weight learning coefficient
ErrHistory = []; % is used to record the training error after each parameter adjustment
for epoch = 1:MaxEpoch
AllDist = dist(Center',SamIn);
SPMat = repmat(SP',1,SamNum);
UnitOut = radbas(AllDist ./SPMat);
NetOut = W*UnitOut;
Error = SamOut-NetOut;
%Stop learning judgment
SSE = sumsqr(Error)
% Record the training error after each weight adjustment
ErrHistory = [ErrHistory SSE];
if SSE<E0, break, end
for i = 1:UnitNum
CentGrad = (SamIn-repmat(Center(:,i),1,SamNum))...
*(Error.*UnitOut(i,*W(i)/(SP(i )^2))';
SPGrad = AllDist(i,.^2*(Error.*UnitOut(i,*W(i)/(SP(i)^3))'; WGrad = Error*UnitOut(i
, ';
Center(:,i) = Center(:,i) + lrCent*CentGrad;
SP(i) = SP(i) + lrSP*SPGrad;
W(i) = W(i) + lrW*WGrad;
end
end
% Test
TestDistance = dist(Center',TargetIn);
TestSpreadsMat = repmat(SP',1,TargetSamNum);
TestHiddenUnitOut = radbas(TestDistance./TestSpreadsMat);
TestNNOut = W*TestHiddenUnitOut;
plot(TargetIn,TestNNOut,'k -')
% Draw learning error curve
figure
hold on
grid
[xx,Num] = size(ErrHistory);
plot(1:Num,ErrHistory,'k-');

3. RBF network design algorithm based on OLS
SamNum = 100; % Number of training samples
TestSamNum = 101; % Number of test samples
SP = 0.6; % Hidden node expansion constant
ErrorLimit = 0.9; % Target error
% Obtain sample input and output according to the objective function
rand('state',sum(100*clock))
NoiseVar = 0.1;
Noise = NoiseVar*randn(1,SamNum);
SamIn = 8*rand(1,SamNum)-4;
SamOutNoNoise = 1.1*(1-SamIn+2*SamIn.^2).*exp(-SamIn.^2/2); SamOut
= SamOutNoNoise + Noise;
TestSamIn = -4:0.08:4;
TestSamOut = 1.1*(1 -TestSamIn+2*TestSamIn.^2).*exp(-TestSamIn.^2/2);
figure
hold on
grid
plot(SamIn,SamOut,'k+')
plot(TestSamIn,TestSamOut,'k--')
xlabel ('Input x');
ylabel('Output y');
[InDim,MaxUnitNum] = size(SamIn); % sample input dimension and maximum allowable number of hidden nodes
% Calculate hidden node output matrix
Distance = dist(SamIn', SamIn);
HiddenUnitOut = radbas(Distance/SP);
PosSelected = [];
VectorsSelected = [];
HiddenUnitOutSelected = [];
ErrHistory = []; % Used to record the training error after each hidden node is added
VectorsSelectFrom = HiddenUnitOut;
dd = sum((SamOut.*SamOut)')';
for k = 1 : MaxUnitNum
% Calculate the square value of the angle between the output vector of each hidden node and the target output vector
PP = sum(VectorsSelectFrom.*VectorsSelectFrom)';
Denominator = dd * PP';
[xxx,SelectedNum] = size(PosSelected);
if SelectedNum>0,
[lin,xxx] = size(Denominator);
Denominator(:,PosSelected) = ones(lin,1);
end
Angle = (( SamOut*VectorsSelectFrom) .^ 2) ./ Denominator;
% Select the vector with the maximum projection to get the corresponding data center
[value,pos] = max(Angle);
PosSelected = [PosSelected pos];
% Calculate the RBF network training error
HiddenUnitOutSelected = [HiddenUnitOutSelected; HiddenUnitOut(pos,];
HiddenUnitOutEx = [HiddenUnitOutSelected; ones(1,SamNum)];
W2Ex = SamOut*pinv(HiddenUnitOutEx); % Use generalized inverse to obtain generalized output weight
W2 = W2Ex(:,1:k); % get output weight
B2 = W2Ex(:,k+1); % get offset
NNOut = W2*HiddenUnitOutSelected+B2; % calculate RBF net output
SSE = sumsqr(SamOut- NNOut)
% Record the training error after each hidden node is added
ErrHistory = [ErrHistory SSE];
if SSE < ErrorLimit, break, end
% Perform Gram-Schmidt orthogonalization
NewVector = VectorsSelectFrom(:,pos);
ProjectionLen = NewVector' * VectorsSelectFrom / (NewVector'*NewVector);
VectorsSelectFrom = VectorsSelectFrom - NewVector * ProjectionLen;
end
UnitCenters = SamIn(PosSelected);%%%%%%%%%%%
% Test
TestDistance = dist(UnitCenters',TestSamIn);%% %%%%%%
TestHiddenUnitOut = radbas(TestDistance/SP);
TestNNOut = W2*TestHiddenUnitOut+B2;
plot(TestSamIn,TestNNOut,'k-')
k
UnitCenters
W2
B2

Three Elman Neural Networks

Elman network generally does not recommend big-step training methods such as trainlm and trainrp

Elman neural network is a two-layer BP network structure with feedback, and its feedback connection is from the output of the hidden layer to its input. This type of feedback enables Elman networks to detect and identify time-varying patterns.

    The hidden layer is also called the feedback layer, the transfer function of the neuron is tansig, the output layer is a linear layer, and the transfer function is purelin. This special two-layer network can approximate any function with arbitrary precision, and the only requirement is that its hidden layer must have a sufficient number of neurons. The more neurons in the hidden layer, the higher the accuracy of approximating complex functions.

      The difference between the Elman network and the traditional two-layer neural network is that the first layer has a feedback connection, which can store the previous value and apply it to this calculation. Different feedback states will result in different output results. Because the network can store information, it can not only store spatial patterns, but also learn temporal patterns.

     Build the Elman network:

      The application function newelm() can construct a two-layer or multi-layer Elman network. The hidden layer function usually uses the tansig transfer function, and the output layer usually uses the purelin transfer function.

     The default BP training function is trainbfg, and trainlm() can also be used, but it is not suitable for Elman network because of its fast training speed. The default BP learning rule is learngdm, and the default performance function is mse.

      The network can be initialized by using the initnw() function, and the network weights and thresholds can be initialized by the Nguyen-Widrow method. Assume that the statement to construct the Elman network is as follows:

                 net=newelm([0 1], [5 1], {'tansig', 'logsig'})

Indicates that the hidden layer in the network layer has 5 neuron nodes, the node function is tansig, the output layer has a neuron, the node function logsig, and the input vector range is [0 1].

      

       Network emulation:

       Randomly generate 8-dimensional binary vectors as input vectors:

      P=round(rand(1,8));

      Convert to unit array form.

     Pseq=con2seq(P);

    Apply the initialization network to the input vector simulation calculation.

    Y=sim(net, Pseq);

   Then the output vector is converted to vector form:

    z=seq2con(Y);

   The output vector is z{1,1}

     Elman neural network training:

     To train the Elman neural network, you can use train() or adapt(). The difference between the two functions is that the train() function uses the backpropagation training function for weight correction, and usually uses the traindx training function; the adapt() function uses the learning rule function for weight correction, and usually uses the learngdm function.

       Elman neural networks are less reliable than some other types of networks because estimates of the error gradient are applied during training and tuning. Precisely because of this, when constructing the network, in order to achieve this precision, the number of neurons in the hidden layer of the Elman neural network is relatively larger than that of other network structures.

      

The MATLAB program for % training is as follows:

% Its input and output samples are as follows:

P=round(rand(1,8));

T=[0 (P(1:end-1)+P(2:end)==2)];

% The input vector is a randomly generated binary vector, and the output vector is 1 only when two consecutive inputs are 1, otherwise it is 0.

% Construct the Elman network:

net=newelm([0 1], [5 1], {'tansig', 'logsig'}, 'trainbfg');

diff1=T-z{1,1}

quad-linear neural network

Linear layers are generally used for adaptive filters in signal processing and prediction.

Newlin , net=newlin([-1,1],s,td,0.01), s is the number of output vectors, td delay vector defaults to [0]

Example net=newlin([-1,1],1,[0 1],0.01),p=[0 -1 1 1 0 -1 1 0 01];y=sim(net,p);y.

[net,y,e,pf]=adapt(net,p,t)

Net=newlind(p,t)

Guess you like

Origin blog.csdn.net/qq_51533426/article/details/130156412