Notes taken after watching the Bilibili course video
7-The overall process of forward propagation_bilibili_bilibili
Pre-saved questions:
1. Non-linear transformation (activation function), how to transform it
The input X passes through a non-linear function, and the output value is used as the input of the next layer, and a non-linear function is used to make a mapping
2. How to operate the weights of the fully connected layer
Convert the two-dimensional feature map output by convolution into a one-dimensional vector
Output feature*weight matrix (n*1) to get a value after full connection
3. Why does the FC layer output classification results?
4. The detailed process of back propagation
1. Loss function
1. Loss of score value
f is the score function; Sj scores for other categories, Syi scores for the correct category, the difference is +1, 1 is an offset value as the tolerance; plus an RW is a regularization penalty term to prevent overfitting;
Loss function = data loss + regularization penalty term
2. Normalized probability value Softmax classifier
Convert score values into probability values,
Use exp to amplify the difference to obtain the probability value. Use -log to place the loss between 0-1. The closer the probability value of the correct category is to 1, the smaller the loss
Input x and W to get the loss value L step by step: called forward propagation
3. Back propagation gradient descent partial derivative value
The backpropagation is the weight W passed by each layer, which is calculated by the gradient descent of the partial derivative
2. Overall infrastructure of neural network
1. Overall structure
After the weight adjustment of each layer, a nonlinear transformation will be performed (the position of the red vertical square in the figure), and the nonlinear transformation is completed by the activation function
2. Input layer input data features x
2.1 Data preprocessing
Standardization operations: subtract the mean, divide by the variance
2.2 Parameter initialization
Random strategy, for weight parameters,
The input layer and the hidden layer are connected by weight parameters
After many layers, each layer performs feature extraction
After multi-layer weight processing, the output value is finally obtained
3. Activation function
3.1 Sigmoid function:
When the value is large or small, the effect of derivation is not good, and the phenomenon of gradient disappearance occurs
3.2 Relu function
4. Loss
Get the loss, then backpropagate, calculate the partial derivative of the parameter W of each layer, and update its value
The purpose of the entire neural network is to find the weight W of each layer, which is more suitable for the current task
5. Solution to overfitting:
5.1 Regularization penalty value:
5.2 DROP-OUT
3. Convolutional Neural Network
1. Convolution layer
The input is the feature value of the image, and the middle Filter filter weight is the convolution kernel. Features of the same size as the convolution kernel are taken piece by piece for inner product (corresponding positions are multiplied and all results are added together) to obtain the features of this piece. value (is a number), the upper left corner block RGB shown in the figure is inner product w0 respectively and then added to the value of 3 (green box on the right).
After a Filter, a feature map is obtained, and n Filters obtain n feature maps. When stacked, the depth is the number of layers of the Filter.
A convolution layer may have multiple convolution kernels and multiple filters. After one convolution layer, the feature map of n layers is obtained.
The size of the convolution result feature map:
2. Pooling layer
Do compression, downsampling,
MAX POOLING: Maximum pooling, select the maximum value in each frame
Each convolutional layer must be followed by a RELU activation layer and a pooling layer.
The pooling layer will only make the length and width smaller, but the number of feature layers remains unchanged.
3. Fully connected layer FC
Stretch the multi-layer feature map obtained by the pooling layer into feature values (vectors) one by one, and perform several classification tasks to obtain the classification results.
Only those with weight parameter calculation are counted as the number of layers (CONV, FC are; activation layers and pooling layers are not counted as the number of layers)
4. Residual network Resnet
Deal with the problem that the effect decreases after stacking convolutional layers all the time.