Overview of Convolutional Neural Networks

I’m noob, can you point out any problems in the study

What is a convolutional neural network?

Convolutional Neural Networks (CNN) is a type of feedforward neural network that contains convolutional calculations and has a deep structure. It is a variant of the multilayer perceptron (MLP). It is essentially a multilayer perceptron, but due to its The method of local connection and weight sharing is adopted. On the one hand, it reduces the number of weights and makes the network easy to optimize. On the other hand, it reduces the complexity of the model and reduces the risk of overfitting. Convolutional neural networks have more obvious advantages when inputting images. They can directly use images as the input of the network, avoiding the complex feature extraction and data reconstruction processes in traditional recognition algorithms. During the processing of two-dimensional images It has great advantages. For example, the network can extract image features including color, texture, shape, and image topological structure by itself. It is used to deal with two-dimensional images, especially the recognition of displacement, scaling and other forms of distortion invariance applications It has good robustness and computational efficiency.

The structure of the convolutional neural network:

The convolutional neural network is generally composed of an input layer, a hidden layer, and an output layer. The hidden layer is mainly composed of a convolutional layer, a pooling layer, a fully connected layer, etc. The general composition is as shown in the figure above.

Convolutional layer:

Through the convolution operation, we can extract the features of the image. Through the convolution operation, some features of the original signal can be enhanced and noise reduced. The convolution operation is roughly as shown in the figure below.

The left side of the figure is the input image (5×5), the middle is the convolution kernel (3×3), and the right is the result of the convolution (3×3). During the operation, the convolution kernel starts from the upper left corner of the image, from left to right from top to bottom, and sequentially performs convolution with a step length of 1. The specific convolution operation is to multiply the coefficients in the convolution kernel by the corresponding position of the image Value, and then accumulate all the values ​​to get the value of a certain pixel of the feature map. For example, 155=(-1)×0+(-2)×0+(-1)×75+0×0+0×75+0×80+1×0 +2×75+1×80 . As we can see from the figure, the simple convolution will make the image size smaller. Therefore, in order to reduce the loss of image features, we usually fill the input image with the filling value usually 0 or repeated boundary filling. Suppose the input image size is l, the number of filling layers for a certain side is p, the convolution kernel size is f, the step size is w, and the output image size is L, then:

        L=(l+2×p-f)÷w+1

Generally speaking, in order to ensure the effect of extracting features, we will not change the size of the feature map during convolution, and the width and height remain unchanged.

Pooling layer:

The pooling layer is generally an operation to reduce the size of the feature map while ensuring that the features are not lost as much as possible. The usual pooling is divided into maximum pooling and average pooling. Generally speaking, the average pooling energy Reduce the error caused by the increase in the variance of the estimated value caused by the limited neighborhood size, thereby retaining the background information of the image, and the maximum pooling can reduce the deviation of the estimated mean value caused by the parameter error of the silver convolution layer Error, thereby retaining more texture information. The figure above is the result of pooling a 4×4 image with 2×2 and a step size of 2.

Fully connected layer:

The fully connected layer in the convolutional neural network is equivalent to the hidden layer in the traditional feedforward neural network. The fully connected layer is usually built in the last part of the hidden layer of the convolutional neural network, and only transmits signals to other fully connected layers. The feature map loses the 3-dimensional structure in the fully connected layer, is expanded into a vector and passed to the next layer through the excitation function. In some convolutional neural networks, the function of the fully connected layer can be partially replaced by global average pooling. Global average pooling will average all the values ​​of each channel of the feature map, that is, if there are 7×7 ×256 feature map, global mean pooling will return a 256 vector, where each element is 7×7, the step size is 7, and the average pooling without padding.

Incentive function:

In the past, the commonly used excitation functions were the Sigmoid function and the tanh function. However, in recent years, the ReLu (linear rectification) function is more commonly used in convolutional neural networks. Using ReLu has the same effect as the previous sigmoid function, but the convergence speed of ReLu will be It is much faster than the other two, and it is easier to learn and optimize. Because of its piecewise linearity, the front pass and back pass and derivation are all piecewise linear, which alleviates the problems of gradient disappearance and overfitting to a certain extent. In deep network training, both tanh and sigmoid tend to saturate at the end value, causing the training speed to be too slow, so the ReLu function is often used by default. However, the ReLu function will have a death problem. The main reason is that the learning rate is too large, so that the output value of the input data is less than 0 when the input data passes through the neuron, and then ReLu will have no effect, so pay attention to the learning rate not to be too large.

The training process of convolutional neural network:

CNN is essentially an input-to-output mapping. It can learn a large number of mapping relationships between input and output without requiring any precise mathematical expressions between input and output, as long as it uses a known pattern to The convolutional network is trained, and the network has the ability to map between input and output pairs. The convolutional network performs supervised training, so its sample set is composed of vector pairs of the form: (input vector, ideal output vector). All these vector pairs should be derived from the actual "running" structure of the system that the network is about to simulate, and they can be collected from the actual running system.

1) Parameter initialization: Before starting training, all weights should be initialized with some different random numbers. "Small random number" is used to ensure that the network will not enter a saturated state due to excessive weights, resulting in training failure; "different" is used to ensure that the network can learn normally. In fact, if the weight matrix is ​​initialized with the same number, the network has no learning ability.

2) The training process includes four steps ① The first stage: the forward propagation stage i takes a sample from the sample set and enters the network ii to calculate the corresponding actual output; at this stage, the information is transformed from the input layer to the output layer This process is also the process performed when the network is executed normally after the training is completed. The second stage: the backward propagation stage iii calculates the difference between the actual output and the corresponding ideal output iv adjusts the weight matrix of the entire network according to the method of minimizing the error Specific training steps:

1. Select the training group, and randomly seek N samples from the sample set as the training group;

2. Set each weight and threshold to a small random value close to 0, and initialize the accuracy control parameters and learning rate;

3. Take an input pattern from the training group and add it to the network, and give its target output vector;

4. Calculate the output vector of the intermediate layer and calculate the actual output vector of the network;

5. Compare the elements in the output vector with the elements in the target vector to calculate the output error; the hidden unit of the intermediate layer also needs to calculate the error;

6. Calculate the adjustment amount of each weight and the adjustment amount of the threshold in turn;

7. Adjust the weight and adjust the threshold;

8. After experiencing M, judge whether the index meets the accuracy requirements, if not, return to (3) and continue iterating; if it is satisfied, go to the next step;

9. At the end of training, save the weights and thresholds in a file. At this time, it can be considered that the various weights have stabilized and the classifier has been formed. Training is performed again, and the weights and thresholds are directly derived from the file for training without initialization.

 

Guess you like

Origin blog.csdn.net/qq_36909245/article/details/104380251