Neural network model architecture tutorial pdf, how to build a neural network model

Steps to build a model using an artificial neural network

There are many kinds of artificial neural networks, and I will only use the most commonly used BP neural network. Different networks have different structures and different learning algorithms. Simply put, an artificial neural network is a function. It's just that this function is different from the general function. It has one more learning process than ordinary functions.

In the process of learning, it constantly corrects its network structure according to the correct results, and finally achieves a satisfactory accuracy. At this time, it begins the real working phase. It is best to install the MatLab software from MathWords to learn artificial neural networks.

Using this software, you can learn to build your own artificial neural network problem-solving model within a week. If you want to program an artificial neural network by yourself, you need to find a related book, specifically the part of the neural network learning algorithm.

Because "learning algorithms" are at the heart of artificial neural networks. The most commonly used BP artificial neural network uses the BP learning algorithm.

Google AI Writing Project: Neural Network Pseudo-Original

How to Build a Neural Network Timing Model in Python

Biological neural network: generally refers to a network composed of biological brain neurons, cells, contacts, etc., which are used to generate biological consciousness and help biological thinking and actions .

Artificial Neural Networks (ANNs for short), also referred to as neural networks (NNs) or connection models (Connection Model), is a kind of algorithmic mathematics that imitates the behavior characteristics of animal neural networks and performs distributed parallel information processing. Model.

This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnection relationship between a large number of internal nodes. Artificial neural network: It is a mathematical model that uses a structure similar to the synaptic connection of the brain for information processing.

In engineering and academia, it is often referred to directly as "neural network" or neural network.

Neural Network BP Model

1. Overview of the BP model The Error Back-Propagation neural network model is referred to as the BP (Back-Propagation) network model for short.

Dr. Pall Werbas proposed the error backpropagation learning algorithm in his doctoral dissertation in 1974. It is a team of scientists headed by Rumelhart and McCelland who completely proposed and widely accepted the error backpropagation learning algorithm.

In the book "Parallel Distributed Processing, Explorations in the Microstructure of Cognition" ("Parallel Distributed Information Processing") published in 1986, they made a detailed analysis and introduction of the error backpropagation learning algorithm, and the potential of this algorithm. capabilities were discussed in depth.

BP network is a hierarchical neural network with 3 or more layers. The neurons in the upper and lower layers are fully connected, that is, each neuron in the lower layer is connected to each neuron in the upper layer, and there is no connection between neurons in each layer.

The network learns in the way taught by the teacher. When a pair of learning modes are provided to the network, the activation value of the neurons propagates from the input layer to the output layer through each hidden layer, and each neuron in the output layer obtains the input of the network. response.

After that, according to the direction of reducing the error between the expected output and the actual output, each connection weight is corrected layer by layer from the input layer through each hidden layer, and finally returns to the input layer, so it is named "error back propagation learning algorithm".

As this error backpropagation correction continues, the accuracy of the network's response to the input pattern continues to increase.

The BP network is mainly used in the following aspects: 1) Function approximation: learn a network to approximate a function with the input pattern and the corresponding expected output pattern; 2) Pattern recognition: use a specific expected output pattern to associate it with the input pattern ;3) Classification: classify the input pattern in a defined appropriate way; 4) Data compression: reduce the dimensionality of the output vector for easy transmission or storage.

In the practical application of artificial neural network, 80% to 90% of the artificial neural network model adopts BP network or its variant form, which is also the core part of the forward network and embodies the most essential part of the artificial neural network.

2. The principle of BP model The following takes the three-layer BP network as an example to illustrate the principle of learning and application.

1. Data definition P pair learning mode (xp, dp), p=1, 2, ..., P; input mode matrix X[N][P]=(x1, x2, ..., xP); target mode matrix d[ M][P]=(d1,d2,...,dP).

Three-layer BP network structure input layer neuron node number S0=N, i=1, 2,..., S0; hidden layer neuron node number S1, j=1, 2,..., S1; neuron activation function f1[ S1]; weight matrix W1[S1][S0]; bias vector b1[S1].

The number of neuron nodes in the output layer S2=M, k=1, 2,..., S2; neuron activation function f2[S2]; weight matrix W2[S2][S1]; deviation vector b2[S2].

Learning parameter target error ϵ; initial weight update value Δ0; maximum weight update value Δmax; weight update value increase multiple η+; weight update value decrease multiple η-.

2. Definition of error function The calculation formula for the error of the pth input mode is the new technology of China mineral resources evaluation and the new evaluation model y2kp is the calculation output of BP network.

3. Derivation of BP network learning formula The guiding idea of BP network learning formula derivation is to modify the weight W and bias b of the network, so that the error function decreases along the negative gradient direction until the network output error accuracy reaches the target accuracy requirement, and the learning ends.

Output calculation formula of each layer Input layer y0i=xi, i=1, 2,..., S0; Hidden layer China mineral resources evaluation new technology and evaluation model y1j=f1(z1j), j=1, 2,..., S1 ; Output layer China's mineral resource evaluation new technology and new evaluation model y2k=f2(z2k), k=1, 2, ..., S2.

The error formula of the output node is derived from the gradient formula of China's mineral resource evaluation new technology and evaluation model to the output layer node. y2m are independent of each other.

Among them, China's mineral resource evaluation new technology and evaluation model are China's mineral resource evaluation new technology and evaluation model. Assuming the output layer node error is δ2k=(dk-y2k) f2′(z2k), then China's mineral resource evaluation new technology and evaluation model In the same way, the new evaluation model of China's mineral resources can be derived from the gradient formula of the new evaluation model of China's mineral resources to hidden layer nodes. A y1j, which is related to all y2k.

Therefore, there is only the summation of k in the above formula, where the new technology and new model of China's mineral resources evaluation is China's new technology of mineral resources evaluation and new model of China's mineral resources evaluation. The model is the new technology and new evaluation model of China's mineral resources evaluation. In the same way, the new technology and new evaluation model of China's mineral resources evaluation can be obtained. 4. Using the elastic BP algorithm (RPROP) to calculate the weight W and the correction value ΔW of the deviation b, Δb, Germany in 1993 In their paper "A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm", Martin Riedmiller and Heinrich Braun proposed a Resilient Backpropagation algorithm - Elastic BP Algorithm (RPROP).

This approach tries to remove the detrimental effect of the magnitude of the gradient on the weight step, thus, only the sign of the gradient is considered to indicate the direction of the weight update.

The size of the weight change is only determined by the special "update value" of the weight learning for the t-th time.

The weight update follows the rule: if the derivative is positive (increasing error), the weight is decreased by its updated value. If the derivative is negative, the updated value increases. China's mineral resource evaluation new technology and new evaluation model RPROP algorithm is a direct modification of the weight step based on local gradient information.

For each weight, we introduce its respective update value, which independently determines the size of the weight update value.

This is an adaptive process based on sign correlation. It is based on the local gradient information on the error function E, and updates the new technology and evaluation model of China's mineral resources evaluation according to the following learning rules, where 0<η-<1<η+.

At each moment, if the gradient of the objective function changes its sign, it indicates that the last update is too large, and the update value should be reduced by the factor η- of the weight update value; if the gradient of the objective function keeps its sign, the update The value should be increased by the weight update value increase factor η+.

In order to reduce the number of freely adjustable parameters, the increase factor η+ and decrease factor η– are set to fixed values η+=1.2, η-=0.5, and these two values have been obtained in a large number of practices. Good results.

The RPROP algorithm uses two parameters: the initial weight update value Δ0 and the maximum weight update value Δmax. When learning starts, all update values are set to the initial value Δ0, because it directly determines the size of the previous weight step, it should follow the weight Choose its own initial value, for example, Δ0=0.1 (default setting).

In order to prevent the weight from becoming too large, set the maximum weight update value limit Δmax, and the default upper bound is set to Δmax=50.0. In many experiments, it is found that by setting the maximum weight update value Δmax to a rather small value, such as Δmax=1.0.

We may achieve smoothing performance with reduced error. 5. Calculate the correction weight W and deviation b for the t-th learning, the correction formula of weight W and deviation b W(t)=W(t-1)+ΔW(t), b(t)=b(t -1)+Δb(t), where t is the learning times.

6.Conditions for the successful end of BP network learning The sum of the cumulative error squares for each learning and the average error of each learning. When the average error MSE<ε, BP network learning Successfully concluded.

7. BP network application prediction When applying the BP network, provide network input to the input layer, apply the given BP network and the weight W and deviation b learned by the BP network, and the network input passes from the input layer through each hidden layer to the input layer. The "forward propagation" process of the output layer calculates the predicted output of the BP network.

8. Neuron activation function f linear function f(x)=x, f'(x)=1, input range of f(x) (-∞, +∞), output range (-∞, +∞). It is generally used in the output layer, which can make the network output any value.

The input range (-∞, +∞) and the output range (0, 1) of the S-type function S(x) China's new mineral resources evaluation technology and new evaluation model f(x). f'(x)=f(x)[1-f(x)], the input range of f'(x) is (-∞, +∞), and the output range is (0, ].

It is generally used in the hidden layer, which can make the input of the range (-∞, +∞) become the network output of (0, 1). For larger inputs, the amplification factor is smaller; for smaller inputs, the amplification factor is smaller. The coefficients are large, so they can be used to handle and approximate nonlinear input/output relationships.

When used for pattern recognition, it can be used in the output layer to generate a binary output that is close to 0 or 1. Hyperbolic tangent sigmoid function The input range (-∞, +∞) and the output range (-1, 1) of the new technology of mineral resources evaluation in China and the new evaluation model f(x).

f'(x)=1-f(x)·f(x), the input range of f'(x) is (-∞, +∞), and the output range is (0, 1].

It is generally used in the hidden layer, which can make the input of the range (-∞, +∞) become the network output of (-1, 1). For larger inputs, the amplification factor is smaller; for smaller inputs, The amplification factor is large, so it can be used to process and approximate nonlinear input/output relationship.

Step function type 1 The input range (-∞, +∞) and the output range {0, 1} of China's new mineral resource evaluation technology and new evaluation model f(x). f'(x)=0.

Type 2 The input range (-∞, +∞) and the output range {-1, 1} of the new technology of China's mineral resources evaluation and the new evaluation model f(x). f'(x)=0.

Slope function type 1 The input range (-∞, +∞) and the output range [0, 1] of the new technology of mineral resources evaluation in China and the new evaluation model f(x). The input range (-∞, +∞) and the output range {0, 1} of the new technology of mineral resources evaluation in China and the new evaluation model f′(x).

Type 2 The input range (-∞, +∞) and the output range [-1, 1] of the new technology of China's mineral resources evaluation and the new evaluation model f(x). The input range (-∞, +∞) and the output range {0, 1} of the new technology of mineral resources evaluation in China and the new evaluation model f′(x).

3. Overall algorithm 1. Three-layer BP network (including input layer, hidden layer, output layer) weight W, deviation b initialization overall algorithm (1) Input parameters X[N][P], S0, S1, f1[ S1], S2, f2[S2]; (2) Calculate the maximum value of each variable of the input mode X[N][P], the minimum value matrix Xmax[N], Xmin[N]; (3) the weight of the hidden layer Value W1, bias b1 initialized.

Case 1: The hidden layer activation functions f( ) are all hyperbolic tangent Sigmoid functions 1) Calculate the range vector Xrng[N] of each variable of the input pattern X[N][P]; 2) Calculate the input pattern X’s The range mean vector Xmid[N] of each variable; 3) Calculate W, the magnitude factor Wmag of b; 4) Generate a uniformly distributed S0×1-dimensional random number matrix Rand[S1] between [-1, 1]; 5 ) Generate a normal distribution S1×S0 dimensional random number matrix Randnr[S1][S0] with a mean of 0 and a variance of 1, and the range of random numbers is approximately [-1, 1]; 6) Calculate W[S1][S0 ], b[S1]; 7) Calculate the initialization weight W1[S1][S0] of the hidden layer; 8) Calculate the initialization bias b1[S1] of the hidden layer; 9)) Output W1[S1][S0] , b1[S1].

Case 2: The hidden layer activation function f( ) is a sigmoid function 1) Calculate the range vector Xrng[N] of each variable of the input pattern X[N][P]; 2) Calculate each variable of the input pattern X The range mean vector Xmid[N]; 3) Calculate W, the magnitude factor Wmag of b; 4) Generate a uniformly distributed S0×1-dimensional random number matrix Rand[S1] between [-1, 1]; 5) Generate the mean is 0, the S1×S0 dimensional random number matrix Randnr[S1][S0] of normal distribution with variance 1, the range of random numbers is roughly [-1, 1]; 6) Calculate W[S1][S0], b [S1]; 7) Calculate the initialization weight W1[S1][S0] of the hidden layer; 8) Calculate the initialization bias b1[S1] of the hidden layer; 9) Output W1[S1][S0], b1[S1 ].

Case 3: The case where the hidden layer activation function f( ) is other functions 1) Calculate the range vector Xrng[N] of each variable of the input pattern X[N][P]; 2) Calculate each variable of the input pattern X The range mean vector Xmid[N]; 3) Calculate W, the magnitude factor Wmag of b; 4) Generate a uniformly distributed S0×1-dimensional random number matrix Rand[S1] between [-1, 1]; 5) Generate the mean is 0, the S1×S0 dimensional random number matrix Randnr[S1][S0] of normal distribution with variance 1, the range of random numbers is roughly [-1, 1]; 6) Calculate W[S1][S0], b [S1]; 7) Calculate the initialization weight W1[S1][S0] of the hidden layer; 8) Calculate the initialization bias b1[S1] of the hidden layer; 9) Output W1[S1][S0], b1[S1 ].

(4) The weight W2 of the output layer and the bias b2 are initialized 1) Generate a S2×S1-dimensional random number matrix W2[S2][S1] uniformly distributed between [-1, 1]; 2) Generate [-1, 1] ] between uniformly distributed S2×1-dimensional random number matrix b2[S2]; 3) output W2[S2][S1], b2[S2].

2. Apply elastic BP algorithm (RPROP) to learn three-layer BP network (including input layer, hidden layer, output layer) weight W, deviation b overall algorithm function: Train3BP_RPROP (S0, X, P, S1, W1, b1, f1, S2, W2, b2, f2, d, TP) (1) input parameter P pair mode (xp, dp), p=1, 2, ..., P; three-layer BP network structure; learning parameters.

(2) Learning initialization 1) ; 2) The gradient values of W and b of each layer are initialized to zero matrix.

(3) Calculate the output y0, y1, y2 of each layer from the input mode X and the average error MSE of the first learning (4) enter the learning cycle epoch=1 (5) judge whether each learning error meets the target error requirement If MSE<ϵ, jump out of the epoch loop and go to (12).

(6) save the gradient values of each layer W and b generated by the epoch-1 learning, (7) seek the gradient values of each layer W and b of the epoch learning, 1) calculate the error backpropagation value δ of each layer; 2) Calculating the gradient values of W and b of each layer at the pth time, ; 3) Calculating the accumulation of the gradient values of W and b generated by p=1, 2, ..., P modes.

(8) If epoch=1, then set the gradient value of each layer W, b learned in the epoch-1th time as the gradient value of each layer W, b generated by the epoch-th learning, .

(9) Calculate the update of each layer W, b 1) Find the weight update value Δij update; 2) Find the weight update value of W, b, ; 3) Find the W, b of each layer after the epoch-time learning correction.

(10) Use the corrected W and b of each layer to obtain the output y0, y1, y2 of each layer for the epoch learning from X and the learning error MSE for the epoch (11) epoch=epoch+1, if epoch≤MAX_EPOCH, go to (5); otherwise, go to (12).

(12) Output processing 1) If MSE<ε, the learning reaches the target error requirement, output W1, b1, W2, b2. 2) If MSE≥ε, the learning does not meet the target error requirement, and learn again.

(13) End 3. Three-layer BP network (including input layer, hidden layer, and output layer) prediction overall algorithm First, use Train3lBP_RPROP ( ) to learn the weight W of three-layer BP network (including input layer, hidden layer, and output layer) , deviation b, and then apply a three-layer BP network (including input layer, hidden layer, and output layer) to predict.

Function: Simu3lBP( ). 1) Input parameters: P input data vectors xp to be predicted, p=1, 2, ..., P; three-layer BP network structure; learned weight W and deviation b of each layer.

2) Calculate the network output y2[S2][P] of P input data vectors xp (p=1, 2, ..., P) to be predicted, and output the prediction result y2[S2][P]. 4. Overall algorithm flow chart The overall algorithm flow chart of BP network is shown in attached drawing 2.

V. Data Flow Diagram The data flow diagram of the BP network is shown in Figure 1.

6. Examples Example 1 National copper mine geochemical prospecting anomaly data BP model classification 1. National copper mine geochemical prospecting anomaly data preparation Use a robust statistical method to select the copper anomaly lower limit value of 33.1 on the national copper mine geochemical prospecting data to generate national copper mines Geochemical anomaly data.

2. Model data preparation According to the anomalous geochemical prospecting data of copper mines in China, the geochemical prospecting data of 33 ore points in 7 categories were selected as the model data.

The seven types are magmatic copper deposits, porphyry copper deposits, skarn deposits, marine volcanic copper deposits, continental volcanic copper deposits, metamorphic copper deposits, and marine sedimentary copper deposits. Added a class of models without the copper anomaly (Table 8-1). 3. Test data Prepare the national geochemical exploration data as the test data set.

4. The number of hidden layers of the BP network structure is 2, and the vector dimensions from the input layer to the output layer are 14, 9, 5, and 1 respectively. The learning rate is set to 0.9 and the systematic error is 1e-5. There is no momentum term. Table 8-1 Model data table Continued Table 5. Calculation results are shown in Figure 8-2 and Figure 8-3.

Figure 8-2 Figure 8-3 Schematic diagram of BP model classification of copper deposit types in China According to the model data, the four types are greenstone-type gold deposits, hydrothermal gold deposits related to intermediate-acid leaching rocks, micro-disseminated gold deposits, and volcanic hydrothermal gold deposits (Table 8-2).

2. Test data preparation Model sample points and some gold mine points have metal content, ore volume, and grade data as test data sets. 3. The input layer of BP network structure is three-dimensional, the hidden layer is one layer, the hidden layer is three-dimensional, the output layer is four-dimensional, the learning rate is set to 0.8, the system error is 1e-4, and the number of iterations is 5000.

Table 8-2 Model data 4. Calculation results See Table 8-3 and 8-4 for the results. Table 8-3 Training and learning results Table 8-4 Prediction results (partial) Continuation.

There are several classification methods for neural network models, try to give a classification

Classification of neural network models There are many models of artificial neural networks, which can be classified in different ways. Among them, two common classification methods are classification according to the topological structure of the network connection and classification according to the information flow inside the network.

1 According to the network topology, the topology of the network is classified, that is, the connection mode between neurons. According to this division, the neural network structure can be divided into two categories: hierarchical structure and interconnected structure.

The neural network with hierarchical structure divides neurons into output layer, intermediate layer (hidden layer), and output layer according to different functions and sequences. Each neuron in the output layer is responsible for receiving input information from the outside world and passing it to the neurons in the hidden layer in the middle; the hidden layer is the internal information processing layer of the neural network and is responsible for information transformation.

It can be designed as one or more layers according to the needs; the last hidden layer transmits the information to the neurons of the output layer for further processing, and then outputs the information processing results to the outside world.

In the interconnected network structure, there may be a connection path between any two nodes, so the interconnected network can be subdivided into three situations according to the degree of connection of nodes in the network: fully interconnected and partially interconnected And sparsely connected type 2 According to the classification of network information flow From the perspective of the internal information transmission direction of the neural network, it can be divided into two types: feedforward network and feedback network.

The structure of the pure feedforward network is the same as that of the layered network structure. The feedforward is named because the direction of network information processing is from the input layer to each hidden layer and then to the output layer layer by layer.

In the feed-forward network, the output of the previous layer is the input of the next layer, and the processing of information has the directionality of layer-by-layer transfer, and generally there is no feedback loop. Therefore, such networks are easily connected in series to build multi-layer feed-forward networks. The structure of the feedback network is the same as that of the single-layer fully interconnected network.

All nodes in the feedback network have information processing functions, and each node can receive input from the outside world and output to the outside world at the same time.