Deep Learning Algorithms and Convolutional Neural Networks

traditional neural network

Inapplicable situation of deep learning: cross-domain (stock forecasting problem), the law of old historical data is not suitable for the law of new data

Matrix calculation:

insert image description here
Input data x[32×32×3]=3072 pixels, expanded into a column,
the purpose: to make a 10 categories, 10 groups of weight parameters, get 10 values, the probability of belonging to each category
bias item b, 10 values
The weight parameter W is obtained: first random, or pre-trained model,
innovation: modify the loss function.
Loss function: Indicates the difference between the prediction and the actual. The larger the difference, the worse the W needs to be adjusted. The smaller the difference, the W fine-tuning

Regularization:

insert image description here
The results of W1 and W2 are the same as x, but it does not mean that w1 and w2 are the same. W1 only focuses on local features (partial, changes drastically with x, and is easily affected by abnormal points), and w2 focuses on the overall situation (balanced, stable changes
) Reflect the difference between w1 and w2? -----Regularized penalty (to prevent overfitting)
abnormal point processing: it is best to manually process it when entering the model (in the game data)

L2 penalty item:
insert image description here

Loss function = data loss + regularization penalty
regularization L2 penalty:
insert image description here

Activation function sigmoid

For classification problems, what we hope to get is a probability value. But the data from the model can do anything, how to do it? —Mapping
sigmoid function:
insert image description here
gradient disappearance problem: the larger the value, the closer the gradient is to 0, and the behavior parameters cannot be updated. The neural network is a model that passes one by one. This problem has led to no development of neural networks from 1997 to 12 years: multiple
linear Group Sum of Regression and Logistic Regression

loss function

insert image description here
After getting the predicted value,
1, first enlarge the difference (add an exponential function e x )
2, and then use normalization to convert it into a probability valueinsert image description here

3. Calculate the loss value: only consider the probability value of the correct category, the closer to 1 the closer the loss value is to 0 Calculate the loss value
log function:
insert image description here

forward propagation

Backpropagation (that is, gradient descent)
neural network: convert the features understood by humans into features understood by computers
insert image description here
Hidden layer 1: Feature 1 = 0.7H-0.1W+0.6A,. . .
Intermediate number – weight
Feature transformation: Wx+b becomes a new set of features
Hidden layer 2 does it again to find more suitable features.

The size and number of W and b:
insert image description here

Activation function Relu

If only linearity is introduced, the problem to be solved is limited – the nonlinear function Relu is introduced
: features less than 0 are directly deleted, and
the more important features are learned more
insert image description here

data preprocessing

Demeaning: insert image description here
Normalization
insert image description here

DROP-OUT

Prevent overfitting without using all neurons
Kill neurons randomly:
insert image description here

Convolutional Neural Networks (CNNs)

There are two problems in processing images with fully connected neural networks:

  • The amount of data to be processed is large and the efficiency is low
  • It is difficult to retain the original features of the image in the process of dimension adjustment, resulting in low accuracy of image processing

1. The composition of the CNN network

The CNN network is inspired by the human visual nervous system.
The CNN network mainly consists of three parts:

  • Convolutional layer, pooling layer and fully connected layer, where the convolutional layer is responsible for extracting local features in the image;
  • The pooling layer is used to greatly reduce the magnitude of parameters (dimension reduction);
  • The fully connected layer is similar to the part of the artificial neural network, which is used to output the desired result.

insert image description here

2. Convolutional layer

2.1 Calculation method of convolution

The purpose of the convolutional layer is to extract the features of the input feature map.
How to convolve: the original image, to extract the features on this image, perform a convolution with the convolution kernel (traverse the convolution kernel on the image), Output a feature result
insert image description here
The convolution operation is essentially a dot product between the filter and a local area of ​​​​the input data.
insert image description here

Point calculation method, and other points can be calculated in the same way to get the final convolution result.
The final result is output as a feature map

2.2 padding


To ensure that the result is the same size as the original image, the feature map is much smaller than the original image through padding in the above convolution process. We can perform padding around the original image to ensure that the size of the feature map is not the same during the convolution process . Change.
Carry out 0 to fill:
5 5 map, to output 5 5 feature map:
insert image description here

2.3 stride

Design of step size: If the step size is 1, the stride can also be increased, for example, if it is set to 2, the feature map can also be extracted, as shown in the following figure:
insert image description here

2.4 Multi-channel convolution

The front is single-channel, and the bottom is multi-channel

The calculation method is as follows: when the input has multiple channels (for example, a picture can have three channels of RGB), the convolution kernel needs to have the same number of channels, and the correspondence between each convolution kernel channel and the input layer Channels are convolved, and the convolution results of each channel are added bit by bit to obtain the final Feature Map

insert image description here

2.5 Multi-convolution kernel convolution

When there are multiple convolution kernels, each convolution kernel learns different features, correspondingly generating a Feature Map containing multiple channels, and
insert image description here
n convolution kernels generate n Feature Maps

2...6 feature map size

The size of the output feature map is closely related to the following parameters:

  • size: Convolution kernel/filter size, usually an odd number, such as 1*1, 3*3, 5*5
  • padding: the way of zero padding
  • stride: The step size
    insert image description here
    is implemented in tf.keras:
tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=None, padding='valid')
#pool_size: 池化窗⼝的⼤⼩
#strides: 窗⼝移动的步⻓,默认为1
#padding: 是否进⾏填充,默认是不进⾏填充的

3. Pooling layer (Pooling)

The pooling layer reduces the input dimension of the subsequent network layer, reduces the model size, improves the calculation speed, and improves the robustness of the Feature Map to prevent over-fitting. It mainly learns the convolutional layer
. The obtained feature map is subjected to subsampling processing, which mainly consists of two types:

3.1 Maximum Pooling

Max Pooling, take the maximum value in the window as the output, this method is widely used
insert image description here

3.2 Average Pooling

Avg Pooling, take the mean of all values ​​​​in the window as output
insert image description here

4. Fully connected layer

The fully connected layer is located at the end of the CNN network. After the feature extraction of the convolutional layer and the dimensionality reduction of the pooling layer, the feature map is converted into a one-dimensional vector and sent to the fully connected layer for classification or regression operations.
insert image description here
flatten expansion, feature map expansion

Guess you like

Origin blog.csdn.net/Sun123234/article/details/129972281