Mathematical Modeling Algorithm and Application【BP Neural Network Algorithm】

Neural networks can be used to evaluate prediction and classification problems. Convolutional neural networks are suitable for large samples. Deep learning includes many types of networks, such as convolutional neural networks, adversarial networks, etc. Deep learning can work with both large and small samples.

Artificial Neural NetworkANN

In the field of machine learning and cognitive science, artificial neural network (ANN), neural network (NN) or neural network is a kind of neural network that imitates biological neural network (animal central nervous system, especially the central nervous system of animals). A mathematical or computational model of the structure and function of the brain) used to estimate or approximate functions.
What can ANN do?
Insert image description here
The main research content of neural network
·neuron model
·activation function
·network structure
·working state
·learning method

A typical neural network has the following three parts:
1. Structure (Architecture) The structure specifies the variables in the network and their topological relationship. For example, variables in a neural network can be the weights of neuron connections and the activation values ​​of neurons (activities of the neurons). 2.
Activity Rule. Most neural network models have a short time scale dynamics. Rules to define how neurons change their firing values ​​based on the activity of other neurons. The general activation function depends on the weights in the network (ie the parameters of that network). 3. Learning Rule The learning rule specifies how the weights in the network are adjusted over time. This is generally regarded as a long-term dynamic rule. In general, the learning rule depends on the activation value of the neuron. It may also depend on the target value provided by the supervisor and the value of the current weight. The figure below is a three-layer neural network. The input layer has d nodes, the hidden layer has q nodes, and the output layer has l nodes. Except for the input layer, the nodes of each layer contain a nonlinear transformation.


Insert image description hereInsert image description here

artificial neurons

Assume that the information from other processing units (neurons) i is xi, and the weight with this processing unit is ωi, i=0, 1,...,n-1, and the threshold inside the processing unit is θ.
Insert image description here
Insert image description here
Insert image description here

activation function

Insert image description here
Insert image description here

network model

According to the different interconnection methods of neurons in the network, network models are divided into:
Feedforward neural networks
only have feedback signals during the training process, while during the classification process, data can only be transmitted forward until reaching the output layer, and there is no direction between layers. The feedback signal after the
feedback neural network
is a neural network with feedback connections from output to input. Its structure is much more complex than the feedforward network. The self-organizing network
self
-organizes and changes adaptively by automatically finding the inherent laws and essential attributes in the sample. Network parameters and structure. The working state
Insert image description here
of the neural network is divided into two states: learning and working. Learning uses the learning algorithm to adjust the connection weights between neurons to make the network output more consistent with reality. The connection weights between working neurons remain unchanged and can be used as a classifier or prediction data. Use. Learning method Learning is divided into learning with a tutor and learning without a tutor. Learning with a tutor sends a set of training sets to the network, and adjusts the connection weights according to the difference between the actual output of the network and the expected output. For example: BP algorithm, tutor-less learning, extracts the statistical characteristics contained in the sample set and stores them in the network in the form of connection weights between neurons. For example: Hebb learning rate











BP neural network (Back Propagation)

The learning process of BP neural network consists of two processes: forward propagation of signals and back propagation of errors . During forward propagation, the input sample is passed in from the input layer. After being processed layer by layer by the hidden layer, it is transmitted to the output layer. If the actual output of the output layer does not match the expected output , it will turn to the back propagation stage of the error. The backpropagation of error is to backpropagate the output error in some form to the input layer layer by layer through the hidden layer, and allocate the error to all units in each layer, thereby obtaining the error signal of each layer unit, and this error signal is used as a correction The basis for the weight of each unit.
The BP (Back Propagation) neural network model is one of the most widely used artificial neural network models currently. BP network can learn and store a large number of input-output pattern mapping relationships without revealing the mathematical equations describing this mapping relationship in advance. Its learning rule is to use the gradient descent method and continuously adjust the weights and thresholds of the network through backpropagation to minimize the sum of square errors of the network. The topology structure of the BP neural network model includes input layer, hidden layer and output layer.
The input layer has multiple input nodes; the output layer has one or more output nodes; the number of hidden layer nodes in the middle is set according to actual needs, and the nodes between the upper and lower layers are fully connected, that is Every node in the lower layer is fully connected to every node in the upper layer, and there is no connection between nodes in the same layer.

BP algorithm

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

Data preprocessing

1. Data preprocessing
It is generally necessary to preprocess the data before training the neural network. An important preprocessing method is normalization . The following is a brief introduction to the principles and methods of normalization processing
(1) What is normalization?
Data normalization is to map data to the [0,1] or [-1,1] interval or a smaller interval, such as (0.1,0.9).
(2) Why normalization is necessary?
<1> The units of input data are different , and the range of some data may be particularly large, resulting in slow convergence of the neural network and long training time .
<2> The input with a large data range may have a larger role in pattern classification, while the input with a smaller data range may have a smaller role . <3>Since the value range of the activation function
of the neural network output layer is limited , the target data of network training needs to be mapped to the value range of the activation function. For example, if the output layer of a neural network uses an S-shaped activation function, since the value range of the S-shaped function is limited to (0,1), that is to say, the output of the neural network can only be limited to (0,1), so the output of the training data It is necessary to normalize to the [0,1] interval. <4> The S-shaped activation function is very flat outside the (0,1) interval , and the discrimination is too small . For example, when the parameter a=1 of the S-shaped function f(x), the difference between f(100) and f(5) is only 0.0067.


Insert image description here
Insert image description here

Related Command Codes

Data preprocessing

Insert image description here
Insert image description here
Insert image description here

Training and testing of neural networks

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

Insert image description here

Example

Let the training matrix P be an m n-order matrix. m represents the number of parameters (attributes) that need to be judged, and n represents how many sets of data (samples) there are.
Let the resulting matrix T be of
order c n. c is the total number of states of the output result, n must be the same as n in P, and there are as many sets of outputs as there are sets of samples.
Column: The grades of 60 children who passed the evaluation of 3 subjects are good or bad, and the results only output good or bad. At this time, P is a matrix of 3 by 60 dimensions, and T is a matrix of 1 by 60 dimensions.
Insert image description here

%以每三个月的销售量经归一化处理后作为输入,可以加快网络的训练速度将每组数据都变为-1至1之间的数
P=[0.5152 0.8173 1.0000;
0.8173 1.0000 0.7308;
1.0000 0.7308 0.1390; 
0.7308 0.1390 0.1087;
0.1390 0.1087 0.3520;
0.1087 0.3520 0.0000;]';
%以第四个月的销售量归一化处理后作为目标向量
T=[0.7308 0.1390 0.1087 0.3520 0.0000 0.3761];
%创建一个BP神经网络,每一个输入向量的取值范围为[0,1],
% 输出层有一个神经元,隐含层的激活函数为tansig,
% 输出层的激活函数为logsig,训练函数为梯度下降函数
net=newff([0 1;0 1;0 1],[5,1],{'tansig','logsig'},'traingd');
net.trainParam.epochs=15000;%训练终止次数
net.trainParam.goal=0.01;%训练终止精度
net=train(net,P,T);%用P和T去训练
Y = sim(net,P)%用P去做仿真
plot(P,T,P,Y,'o')%画出图像

Example: Training a BP network using the momentum gradient descent algorithm

% 训练样本定义如下: 
% 输入矢量为     
%  p =[-1 -2 3  1  
%      -1  1 5 -3] 
% 目标矢量为   t = [-1 -1 1 1] 
close all  
clear  
clc 
% ---------------------------------------------------------------
% NEWFF——生成一个新的前向神经网络,函数格式:
% net = newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF) takes,
% PR -- R x 2 matrix of min and max values for R input elements
%  (对于R维输入,PR是一个R x 2 的矩阵,每一行是相应输入的边界值)
% Si -- 第i层的维数
% TFi -- 第i层的传递函数, default = 'tansig'
% BTF -- 反向传播网络的训练函数, default = 'traingdx'
% BLF -- 反向传播网络的权值/阈值学习函数, default = 'learngdm'
% PF -- 性能函数, default = 'mse'
% ---------------------------------------------------------------
% TRAIN——对 BP 神经网络进行训练,函数格式:
%  train(NET,P,T,Pi,Ai,VV,TV),输入参数:
% net -- 所建立的网络
% P -- 网络的输入
% T -- 网络的目标值, default = zeros
% Pi -- 初始输入延迟, default = zeros
% Ai -- 初始网络层延迟, default = zeros
% VV -- 验证向量的结构, default = []
% TV -- 测试向量的结构, default = []
% 返回值:
% net -- 训练之后的网络
% TR -- 训练记录(训练次数及每次训练的误差)
% Y -- 网络输出
% E -- 网络误差
% Pf -- 最终输入延迟
% Af -- 最终网络层延迟
% ---------------------------------------------------------------
% SIM——对 BP 神经网络进行仿真,函数格式:
% [Y,Pf,Af,E,perf] = sim(net,P,PiAi,T)
% 参数与前同。
% ---------------------------------------------------------------
% 
%  定义训练样本  
% P 为输入矢量 
echo on
P=[-1,  -2,    3,    1;
    -1,    1,    5,  -3]; 
% T 为目标矢量  
T=[-1, -1, 1, 1];  
%  创建一个新的前向神经网络  
net=newff(minmax(P),[3,1],{'tansig','purelin'},'traingdm') 
% ---------------------------------------------------------------
% 训练函数:traingdm,功能:以动量BP算法修正神经网络的权值和阈值。
% 它的相关特性包括:
% epochs:训练的次数,默认:100
% goal:误差性能目标值,默认:0
% lr:学习率,默认:0.01
% max_fail:确认样本进行仿真时,最大的失败次数,默认:5
% mc:动量因子,默认:0.9
% min_grad:最小梯度值,默认:1e-10
% show:显示的间隔次数,默认:25
% time:训练的最长时间,默认:inf
% ---------------------------------------------------------------
%  当前输入层权值和阈值  
inputWeights=net.IW{1,1}  
inputbias=net.b{1}  
%  当前网络层权值和阈值  
layerWeights=net.LW{2,1}  
layerbias=net.b{2}  
%  设置网络的训练参数  
net.trainParam.show = 50;  
net.trainParam.lr = 0.05;  
net.trainParam.mc = 0.9;  
net.trainParam.epochs = 1000;  
net.trainParam.goal = 1e-3;   
%  调用 TRAINGDM 算法训练 BP 网络  
[net,tr]=train(net,P,T);  
%  对 BP 网络进行仿真  
A = sim(net,P)  
%  计算仿真误差  
E = T - A  
MSE=mse(E)  
echo off
figure;
plot((1:4),T,'-*',(1:4),A,'-o')

Running results
Insert image description here
The prediction accuracy is still relatively high.
Insert image description here
Click on the training graph to get the change results of each data during the training process.
Insert image description here

Advantages and disadvantages of BP neural network

advantage

  1. Nonlinear mapping capability : BP neural network essentially implements a mapping function from input to output. Mathematical theory proves that a three-layer neural network can approximate any nonlinear continuous function with arbitrary accuracy . This makes it particularly suitable for solving problems with complex internal mechanisms, that is, the BP neural network has strong nonlinear mapping capabilities.
    2) Self-learning and adaptive capabilities : During training, the BP neural network can automatically extract "reasonable rules" between output and output data through learning, and adaptively memorize the learning content in the weights of the network. That is, the BP neural network has a high degree of self-learning and self-adaptive capabilities.
  2. Generalization ability : The so-called generalization ability means that when designing a pattern classifier, it is necessary to consider whether the network can correctly classify the required classification objects, and also whether the network can classify unseen patterns or patterns after training. There are noise pollution patterns for correct classification. That is to say, BP neural network has the ability to apply learning results to new knowledge .
    4) Fault tolerance : BP neural network will not have a great impact on the global training results after its local or partial neurons are damaged, which means that the system can still work normally even if it is damaged locally. That is, BP neural network has certain fault tolerance.
    Disadvantages
    1. Local minimization problem : From a mathematical point of view, the traditional BP neural network is a local search optimization method. It solves a complex nonlinear problem, and the weight of the network is improved locally. The direction is gradually adjusted, which will cause the algorithm to fall into a local extreme and the weights to converge to the local minimum point, resulting in network training failure. In addition, BP neural network is very sensitive to the initial network weight. If the network is initialized with different weights, it will often converge to different local minima. This is also the fundamental reason why many scholars obtain different results every time they train.
    2. The convergence speed of the BP neural network algorithm is slow : Since the BP neural network algorithm is essentially a gradient descent method, the objective function it wants to optimize is very complex. Therefore, a "zigzag phenomenon" will inevitably occur, which makes the BP algorithm low Effective; and because the optimization objective function is very complex, it will inevitably appear some flat areas when the neuron output is close to 0 or 1. In these areas, the weight error changes very little, making the training process almost come to a halt; BP In the neural network model, in order for the network to execute the BP algorithm, the traditional one-dimensional search method cannot be used to find the step size of each iteration, but the update rules of the step size must be given to the network in advance. This method will also cause inefficiency of the algorithm. All of the above have led to the slow convergence speed of the BP neural network algorithm.
    3.There are different choices of BP neural network structure : There is still no unified and complete theoretical guidance for the selection of BP neural network structure, and it can generally only be selected by experience. If the network structure is chosen too large, the training efficiency will not be high, and over-fitting may occur, resulting in low network performance and reduced fault tolerance. If the network structure is chosen too small, the network may not converge. The structure of the network directly affects the approximation ability and generalization properties of the network. Therefore, how to choose an appropriate network structure in an application is an important issue.
    4. The contradiction between application instances and network scale : BP neural network is difficult to solve the contradiction between the instance scale of application problems and the network scale, which involves the relationship between the possibility and feasibility of network capacity, that is, the learning complexity problem.
    5. The conflict between the prediction ability and training ability of BP neural network : Prediction ability is also called generalization ability or generalization ability, while training ability is also called approximation ability or learning ability. Generally speaking, when the training ability is poor, the prediction ability is also poor, and to a certain extent, as the training ability improves, the prediction ability will be improved . However, this trend is not fixed. It has a limit. When this limit is reached, as the training ability improves, the prediction ability will decrease instead, which is the so-called " over-fitting " phenomenon. The reason for this phenomenon is that the network has learned too many sample details, and the learned model can no longer reflect the patterns contained in the samples. Therefore, how to grasp the degree of learning and solve the conflict between the network's prediction ability and training ability is also a problem of BP neural network. Important research contents of the network
    6. BP neural network sample dependence problem : The approximation and generalization capabilities of the network model are closely related to the typicality of the learning samples, and it is a very difficult problem to select typical sample instances from the problem to form a training set.
    There are two main categories of traditional BP algorithm improvements :
    heuristic algorithms: such as additional momentum methods and adaptive algorithms.
    Numerical optimization methods: such as conjugate gradient method, Newton iterative method, Levenberg-Marquardt algorithm

Guess you like

Origin blog.csdn.net/Luohuasheng_/article/details/128675915