Introduction to the neural network

1, analog biological neurons
Here Insert Picture Description2, the hierarchy
Here Insert Picture Description
input layer, a hidden layer (1,2), the output layer; lines will be appreciated that as the weight parameter w. W need to specify the size of the neural network (matrix size)

Neural Network Process: input data; before calculating the propagation loss value; gradient back propagation calculation; using a gradient parameter update

3, the structure of the nonlinear (activation function)

Here Insert Picture Description
After activation function applied to the previous layer weighting parameters:
Here Insert Picture Description
4, the activation function

Here Insert Picture Description
4.1 Sigmoid activation function
Here Insert Picture Description

The backpropagation derivation operations:

Here Insert Picture Description
When the absolute value of x is large, the derivative is nearly zero, the gradient prone to disappear in the chain rule, so that no further updates to the weight parameter, the neural network can not converge, and therefore most of the later neural network does not use this as a function of activation function.

4.2 ReLU activation function

Here Insert Picture Description
ReLU activation function can solve the problem of the disappearance of the gradient, on the other hand guide the sake of convenience, and therefore the subsequent neural networks typically use this function as activation function.

5, items important role in neural networks regularization

Here Insert Picture Description
Since some outliers, neural networks are prone to overfitting, regularization penalty term can effectively inhibit the over - fitting, enhance the generalization ability of neural networks.
Here Insert Picture Description

The more neurons (equivalent weight parameters), the more energy can express complex models, but the greater the risk of over-fitting

6, data preprocessing

Here Insert Picture Description
In the center of 0 (mean subtracted) and the normalization process (standard deviation divided by eliminating x, y-axis different floating).

7, the weight w and the bias term b initialization

Weights can be initialized to the same value, otherwise the back-propagation is updated in one direction, corresponding to the iterative neural network is too slow. Initialization typically Gaussian random initialization or
Here Insert Picture Description
b can be a constant value (1 or 0) is initialized.

8,Drop-out

Full connection: For n-1 and n layers, any node n-1 layer, and the n-th layer are connecting all nodes. I.e., when the n-th layer of each node calculation is performed, the input activation function n-1 is the weighted sum of all the nodes of the layer.
Here Insert Picture Description

Full connectivity is a good model, but a lot of time to network, train speed will be very slow, and prone to over-fitting phenomenon.

In order to solve these problems randomly without regard to the training section at each neuron (some of the parameters of heavy weights is not updated), namely Drop-out operation in the following figure:
Here Insert Picture Description
reduction parameters although involved in the training, but we can increase the number of iterations to make up this defect.

Guess you like

Origin blog.csdn.net/qq_43660987/article/details/91629465