Two-layer and N-layer fully connected neural network model principle

foreword

  Deep learning is to learn the internal laws and representation levels of sample data. The information obtained during the learning process is of great help to the interpretation of data such as text, images and sounds. Fully connected neural network (MLP) is one of the basic network types, which fully embodies the characteristics of deep learning methods compared with traditional machine learning algorithms, namely big data drive, formula derivation, self-iterative update, black box training, etc. This article will analyze and explain MLP from two layers and above.

1. Two-layer MLP

  The difference between a two (shallow) layer neural network and a single layer network is that the hidden layer has multiple neural nodes, which allows it to deal with the complex problem of " multiple input and multiple output ".

1.1 Forward Propagation

y ( x , W ) = W x + b {\rm{y}}(x,W) = Wx + b y(x,W)=Wx+b
  wherexxx represents the input image, its dimension isddd; y y y is a score vector whose dimension is equal to the number of categoriesccc W = [ w 1 ⋅ ⋅ ⋅ w c ] T W = [w_1 \cdot \cdot \cdot w_c ]^T W=[w1wc]T为权值矩阵, w i = [ w i ⋅ ⋅ ⋅ w i d ] T w_i = [w_i \cdot \cdot \cdot w_{id} ]^T wi=[wiwid]T is theiiThe weight vector of i categories; b = [ bi ⋅ ⋅ ⋅ bc ] T b = [b_i \cdot \cdot \cdot b_c ]^Tb=[bibc]T is the bias vector,bi b_ibifor secondThe bias of i categories, then the two-layer MLP is
y = W 2 σ ( 0 , W 1 x + b 1 ) + b 2 y = W_2 \sigma (0,W_1 x + b_1 ) + b_2y=W2σ ( 0 ,W1x+b1)+b2, where σ \sigmaσ is the activation function

1.2 Backpropagation

  What backpropagation does is let each neuron have a W and b W and bThe values ​​of W and b , that is, the gradient. In this way, when we pass in a new data, we can accurately predict it, and of course it is also a feedback on the data propagated from each layer. When the data is fed back, the loss function is the evaluation method. The following will take themean square error loss functionas an example to descend its gradient.

  损失函数:L ( y ^ , y ) = 1 2 ( y ^ i − yi ) 2 L(\hat y,y) = \frac{1}{2}(\hat y_i - y_i )^2L(y^,y)=21(y^iyi)2

  梢度 名生: w 1 = w 0 − η d L ( w ) dwb 1 = b 0 − η d L ( b ) db \begin{array}{l}w_1 = w_0 - \eta \frac{ {dL(w ) }}{ {dw}} \\ \\ b_1 = b_0 - \eta \frac{ {dL(b)}}{ {db}}\\ \end{array}w1=w0thedwdL(w)b1=b0thedbdL(b)

  Among them w 0 w_0w0and b 0 b_0b0is our current actual value, − η - \etaη isthe step size(certain value), whenLLL takes extreme valuewww time,w 1 w_1w1Is the value obtained by gradient descent.

  When the gradient of the loss function is reduced, the chain rule needs to be solved

d L ( a , y ) dw = d L ( a , y ) da ⋅ dadz ⋅ dzdw \frac{ { dL(a,y) }}{ {dw}} = \frac{ {dL(a,y)}}{ {da}} \cdot \frac{ {da}}{ {dz}} \cdot \frac{ {dz}}{ { dw}}dwdL(a,y)=d adL(a,y)dzd adwdz
  Deduction :
  Gradient descent into the loss function
insert image description here
  chain rule
insert image description here
  final result
insert image description here

2. N-tier MLP

  N-layer fully connected neural network - a network with N number of layers other than the input layer.
  In a neural network, as the number of layers in the network increases, each layer abstracts the previous layer deeper. Each layer of neurons learns a more abstract representation of the values ​​of neurons in the previous layer.Three-layer neural network is also called two hidden layer neural network,Three-layer MLP is: y = W 3 σ ( 0 , W 2 σ ( 0 , W 1 x + b 1 ) + b 2 ) y = W_3 \sigma (0,W_2 \sigma (0,W_1 x + b_1 ) + b_2 )y=W3σ ( 0 ,W2σ ( 0 ,W1x+b1)+b2) , whereσ \sigmaσ is the activation function.
three floors

2.1 Network parameters

  Parameter: Refers to the algorithm running iterations and correcting the final stable value.

  Hyper-parameters: network structure - the number of neurons in the hidden layer, the number of network layers, nonlinear unit selection, etc.
     Optimization related - learning rate, dropout ratio, regularization term strength, etc.

2.2 Hyperparameter optimization

  Grid search method :

    ① Take several values ​​for each hyperparameter, and combine these hyperparameter values ​​to form multiple sets of hyperparameters; ②Evaluate the
    model performance of each group of hyperparameters on the validation set;
    ③Select the set of values ​​​​used by the model with the best performance as the final hyperparameter value.

  Random search method :

    ① Randomly select points in the parameter space, and each point corresponds to a set of hyperparameters;
    ② Evaluate the model performance of each set of hyperparameters on the verification set;
    ③ Select the set of values ​​​​used by the model with the best performance as the final hyperparameters value.
insert image description here
  Hyperparameter search strategy :

    ①Coarse search: Use the random method to sample hyperparameters in a large range, train for one cycle, and narrow the range of hyperparameters based on the accuracy of the verification set.
    ②Fine search: Use the random method to sample hyperparameters within the aforementioned narrowed range, run the model for five to ten cycles, and select the set of hyperparameters with the highest accuracy on the verification set.
insert image description here

3. MLP optimization

  Non-linear factors: Around the activation function, to increase the calculation rate, the activation function must be de-integrated, de-differentiated, and easy to obtain partial derivatives, so as to solve the problems of gradient disappearance and gradient explosion;

  Iterative update: update weights and biases around backpropagation, loss function selection, optimizer selection, learning rate decay strategy, etc.;

  Backbone network: How many layers should be set up in the network, and how many nodes should each layer have.

  The above is the principle of the MLP model of two layers and N layers (taking three layers as an example). For MLP optimization, you can refer to the optimization and improvement of the fully connected neural network in this column.

Guess you like

Origin blog.csdn.net/m0_58807719/article/details/128156231