Learning Process Notes (3) Neural Network

Notes taken after watching the Bilibili course video

7-The overall process of forward propagation_bilibili_bilibili

Pre-saved questions:

1. Non-linear transformation (activation function), how to transform it

The input X passes through a non-linear function, and the output value is used as the input of the next layer, and a non-linear function is used to make a mapping

2. How to operate the weights of the fully connected layer

Convert the two-dimensional feature map output by convolution into a one-dimensional vector

Output feature*weight matrix (n*1) to get a value after full connection

3. Why does the FC layer output classification results?

4. The detailed process of back propagation

1. Loss function

1. Loss of score value

 f is the score function; Sj scores for other categories, Syi scores for the correct category, the difference is +1, 1 is an offset value as the tolerance; plus an RW is a regularization penalty term to prevent overfitting;

Loss function = data loss + regularization penalty term

2. Normalized probability value Softmax classifier

Convert score values ​​into probability values,

 Use exp to amplify the difference to obtain the probability value. Use -log to place the loss between 0-1. The closer the probability value of the correct category is to 1, the smaller the loss

Input x and W to get the loss value L step by step: called forward propagation

3. Back propagation gradient descent partial derivative value

The backpropagation is the weight W passed by each layer, which is calculated by the gradient descent of the partial derivative

2. Overall infrastructure of neural network

1. Overall structure

After the weight adjustment of each layer, a nonlinear transformation will be performed (the position of the red vertical square in the figure), and the nonlinear transformation is completed by the activation function

2. Input layer input data features x

2.1 Data preprocessing  

Standardization operations: subtract the mean, divide by the variance

2.2 Parameter initialization

Random strategy, for weight parameters,

The input layer and the hidden layer are connected by weight parameters

After many layers, each layer performs feature extraction

After multi-layer weight processing, the output value is finally obtained

3. Activation function

3.1 Sigmoid function:

When the value is large or small, the effect of derivation is not good, and the phenomenon of gradient disappearance occurs

3.2 Relu function

4. Loss

Get the loss, then backpropagate, calculate the partial derivative of the parameter W of each layer, and update its value

The purpose of the entire neural network is to find the weight W of each layer, which is more suitable for the current task

5. Solution to overfitting:

5.1 Regularization penalty value:

5.2 DROP-OUT

3. Convolutional Neural Network

 1. Convolution layer

The input is the feature value of the image, and the middle Filter filter weight is the convolution kernel. Features of the same size as the convolution kernel are taken piece by piece for inner product (corresponding positions are multiplied and all results are added together) to obtain the features of this piece. value (is a number), the upper left corner block RGB shown in the figure is inner product w0 respectively and then added to the value of 3 (green box on the right).

After a Filter, a feature map is obtained, and n Filters obtain n feature maps. When stacked, the depth is the number of layers of the Filter.

 A convolution layer may have multiple convolution kernels and multiple filters. After one convolution layer, the feature map of n layers is obtained.

The size of the convolution result feature map:

 2. Pooling layer

Do compression, downsampling,

MAX POOLING: Maximum pooling, select the maximum value in each frame

Each convolutional layer must be followed by a RELU activation layer and a pooling layer.

The pooling layer will only make the length and width smaller, but the number of feature layers remains unchanged.

3. Fully connected layer FC

Stretch the multi-layer feature map obtained by the pooling layer into feature values ​​(vectors) one by one, and perform several classification tasks to obtain the classification results.

Only those with weight parameter calculation are counted as the number of layers (CONV, FC are; activation layers and pooling layers are not counted as the number of layers)

4. Residual network Resnet

Deal with the problem that the effect decreases after stacking convolutional layers all the time.

  

Guess you like

Origin blog.csdn.net/qq_51141671/article/details/132091651