Neural network (NN) network construction and model algorithm introduction

overview

The main function of the neural network is as a tool for extracting features, and the final classification is not the main core.

The artificial neural network is also called a multi-layer perceptron, which is equivalent to performing a non-linear transformation on the original input features through the previous fully connected layer networks, and taking the transformed features to the classifier of the last layer for classification. .

A neural network is a topological structure composed of multiple neurons, arranged in multiple layers, and each layer is stacked with multiple neurons. It usually consists of an input layer, N hidden layers, and an output layer.

Output layer: In the classification task, if it is a binary classification task, the output layer only needs 1 neuron. If it is a K classification problem, the output layer must have K neurons. The probability of each classification can be obtained by substituting the classification function into each neuron in the output layer, and the highest probability is taken as the classification result. The classifier model for multi-classification problems often uses the Softmax regression model, that is, the Logistic regression model for multi-classification problems.

Input layer: Taking a 128x128 pixel image as an example, the artificial neural network can only process one-dimensional data. Therefore, the image must be sequentially expanded into one-dimensional pixel data. That is, the data of (1, 16384) 1 row and 16384 columns is also 16384 features. Each neuron in the input layer receives only one input feature, so the number of neurons in the input layer is the same as the number of features of the image, that is, the number of pixels is 16384. Then one input of each neuron requires one weight w, and each neuron requires one parameter b, so the required parameters of the input layer are 16384x2 =32768.

Hidden layer: Each hidden layer in the middle can set multiple neurons according to requirements. Each neuron needs to input the number of input features of the previous layer of neurons. The input layer is special. One neuron corresponds to only one input X, and each neuron in the middle hidden layer corresponds to the number x of all neurons in the previous layer. Suppose the hidden layer consists of 328 neurons. The previous layer connected is the input layer with 128x128=16384 neurons. Then each neuron corresponds to 16384 input x, so the required parameters of a neuron are 16384 w and 1 b, then all the required parameters of this layer are: (16384+1)x328=5,374,280 parameters to be requested. It can be seen that the number of parameters of the artificial neural network is very large, it is difficult to calculate and it is easy to overfit.                                               

1. The feedforward neural network has a strong fitting ability, and common continuous nonlinear functions can be approximated by the feedforward neural network.

2. According to the general approximation theorem, the neural network can be used as a "universal" function to some extent, which can be used to perform complex feature conversion, or to approximate a complex conditional distribution

3. Generally, parameter learning is carried out through empirical risk minimization and regularization. Because of the powerful capabilities of neural networks, it is easy to overfit on the training set.

4. The optimization problem of neural network is a non-convex optimization problem, and it may face the problem of gradient disappearance.

Single neuron model: 

Neural network structure: 

 

main content

A model of a neuron:

 In the feedforward neural network, each neuron belongs to different layers. Neurons in each layer can receive signals from neurons in the previous layer and generate signals to output to the next layer. The 0th layer is called the input layer, the last layer is called the output layer, and the other intermediate layers are called hidden layers. There is no feedback in the whole network, and the signal propagates in one direction from the input layer to the output layer, which can be represented by a directed acyclic graph.

Feedforward neural network structure

 

The essence of the neural network is that it can perform arbitrary complex nonlinear transformations on the input features and convert them into easy-to-classify features.

Feedforward neural network has a strong fitting ability, and common continuous nonlinear functions can be approximated by feedforward neural network.

According to the general approximation theorem, for a feed-forward neural network composed of a linear output layer and at least one hidden layer using a "squeezed" activation function, it can be approximated with arbitrary precision as long as the number of neurons in the hidden layer is sufficient. Any one defined in the real number space ℝ

The general approximation theorem only shows that the computing power of the neural network can approximate a given continuous function, but it does not give how to find such a network and whether it is optimal. In addition, when applied to machine learning, the real mapping function is not known, and parameter learning is generally performed through empirical risk minimization and regularization. Because of the powerful capabilities of the neural network, it is easy to overfit on the training set.

According to the general approximation theorem, the neural network can be used as a "universal" function to some extent, which can be used to perform complex feature transformations, or to approximate a complex conditional distribution

In machine learning, the characteristics of the input samples have a great influence on the classifier. Taking supervised learning as an example, good features can greatly improve the performance of classifiers. Therefore, in order to achieve good classification results, the original feature vector of the sample needs to be obtained . This process is called feature extraction.

Multi-layer feed-forward neural network can be regarded as a nonlinear composite function  , which maps input to output . Therefore, the multi-layer feed-forward neural network can also be regarded as a feature conversion method, and its output

Classified as the input of the classifier. 

(1) Network model

 After the original input features are nonlinearly transformed by a multi-layer hidden layer network, a commonly used traditional machine learning classifier is used for classification after the final output layer.

 

***The above content explains the structure of the neural network and the principle of the model formula, how to build a model from input data x to output y. Corresponding to the first step of building a model in the actual combat course, and the explanation of its principles.

(2) Construct loss function, parameter learning

After completing the first step, with the model, the next step is to construct a loss function for optimization, and use the loss function to optimize all the parameters w and b to be requested in the network

The commonly used loss function has used the cross entropy loss function, and its minimization

 yt corresponds to the real label, and y_ is the result of the model prediction

(3) With the loss function, the appropriate optimizer can be selected according to actual needs to iteratively optimize the above loss function

(4) Finally, in actual combat applications, model evaluation indicators such as test accuracy should be constructed to monitor the status of model training

The optimization process uses backpropagation to gradually update the parameter gradient value. In practical applications, all platforms support automatic gradient calculation, which will not be explained here.

Replenish:

The parameter learning of neural network is more difficult than that of linear model,

There are two main reasons: 1) non-convex optimization problem and 2) gradient vanishing problem.

The optimization problem of neural network is a non-convex optimization problem

 

In this way, the error will continue to attenuate through each layer of transmission. When the number of network layers is very deep, the gradient will continue to decay or even disappear, making it difficult to train the entire network. This is the so-called vanishing gradient problem (Vanishing Gradient Problem), also known as the gradient diffusion problem.

Due to the saturation of the sigmoid-type function, the derivative in the saturation region is even closer to 0

There are many ways to alleviate the vanishing gradient problem in deep neural networks. A simple and effective way is to use an activation function with a relatively large derivative, such as ReLU.

Guess you like

Origin blog.csdn.net/stephon_100/article/details/125452961