What is a Convolutional Neural Network?

introduce

  The biggest problem with fully connected neural networks processing images is thatThere are too many parameters in the fully connected layer. In addition to slowing down the calculation speed, the increase in parameters can easily lead to over-fitting problems. So a more reasonable neural network structure is needed to effectively reduce the number of parameters in the neural network - convolutional neural network .

1. Overview of Convolutional Neural Networks

  A type of feed-forward neural network with convolutional calculation and deep structure is one of the representative algorithms of deep learning. It uses supervised training to train a specific multi-layer perceptron for two-dimensional shape invariant ( translation invariant ) recognition.
  The default input is an image, which allows us to encode specific properties into the network structure, making our feedforward function more efficient and reducing a large number of parameters; and itsConvolution Kernel Parameter Sharing and Sparsity of Interlayer ConnectionsIt enables the convolutional neural network to learn lattice features, such as pixels and audio, with a small amount of calculation, has a stable effect, and has no additional feature engineering requirements for the data.
  Three key operations : one is local connection, the other is parameter sharing, and the third is pooling layer, which effectively reduces the number of parameters in the network and alleviates the problem of over-fitting of the model.
  Dimensional applications : One-dimensional convolutional neural networks are mainly used for sequence data processing, two-dimensional convolutional neural networks are often used for image text recognition, and three-dimensional convolutional neural networks are mainly used for medical image and video data recognition.

2. Overview of convolutional neural network structure

  The overall architecture of the convolutional neural network: the convolutional neural network is a multi-layer supervised learning neural network, usually divided into input layer, convolutional layer, pooling layer, and fully connected layer, where the convolutional layer and the pooling layer are implemented The core module of the convolutional neural network feature extraction function.

insert image description here

2.1 Input layer

  Generally represents a pixel matrix of an image. The leftmost side of the figure is the input 3D matrix image. The length and width of the 3D matrix represent the size of the image, and the depth of the 3D matrix represents the color channel of the image.Black-and-white images have a depth of 1, and images in RGB color mode have a depth of 3

2.2 Convolution layer

  Unlike the fully connected layer, the convolutional layerThe input of each node is only a small piece of the previous layer of neural network, the commonly used size of this small block is 3×3 or 5×5. In general, the node matrix processed by the convolutional layer will become deeper.

2.3 Pooling layer

  The pooling layer does not change the depth of the three-dimensional matrix, but it can reduce the size of the matrix. The pooling operation can be thought of as converting a high-resolution image into a lower-resolution image. Through the pooling layer, the number of nodes in the last fully connected layer can be further reduced, so as to achieve the purpose of reducing the parameters of the entire neural network.The pooling layer itself has no trainable parameters

2.4 Fully connected layer

  After multiple rounds of convolutional layers and pooling layers, at the end of CNN, 1 to 2 fully connected layers are usually used to give the final classification results. After several rounds of convolution and pooling operations, it can be considered that the information in the image has been abstracted into features with higher information content. That is, convolution and pooling can be regarded as the process of automatic image extraction. After the feature extraction is completed, the fully connected layer is used to complete the classification task.

2.5 Mutual conversion between fully connected layer and convolutional layer

  ※For any convolutional layer, it only needs to be flattened when it is converted into a full connection;
  ※Any full connection is converted into a convolution, such as: a K=4096 FC, the input layer size is 7×7×512, it can be equivalent It is a convolution layer with F (convolution kernel)=7, P (pixel)=0, S (step size)=1, K=4096, and the size of the feature map after convolution is 1×4096.

3. Features of Convolutional Neural Networks

  Local connection : A certain position on the output matrix of the convolutional layer is only related to part of the input matrix, rather than the entire input matrix. A feature output by the convolutional layer may only be related to a certain part of the input image, and has no relationship with information in other positions. Local connections can make the feature focus only on the part it should focus on. It also reduces the parameters of the neural network.
  Parameter sharing : The parameters of the filter in the same convolution layer are shared. No matter where a filter performs convolution operation, the value in the filter matrix is ​​the same. (The parameters of different filters in the same layer are different, and the parameters of filters between different layers are also different.) Sharing the parameters of filters can make the content in the image not affected by the position.

4. Convolutional Neural Network Summary

  Convolutional neural networks are mainly used to recognize two-dimensional graphics that are invariant to displacement, scaling, and other forms of distortion.

  The essence is a mapping from input to output. It can learn a large number of mapping relationships between input and output. It does not require any precise mathematical expressions between input and output. It only needs to use known patterns to apply convolutional networks. After training, the network has the ability to map between input and output pairs.

  The number of channels of the convolution kernel is the same as the number of channels of the input feature matrix, that is, the input is 3-dimensional, so the convolution kernel is also 3-dimensional; after convolution, the number of channels of the output feature matrix is ​​the same as the number of convolution kernels, that is, 2 Convolution kernel, then the depth of the output matrix is ​​2

  The weights of neurons on the same feature map are the same, and the network can learn in parallel. The layout is closer to the actual biological neural network, and the parameter sharing reduces the complexity of the network. In particular, the image of the multi-dimensional input vector can be directly input into the network, which avoids the complexity of data reconstruction during feature extraction and classification.

  The above is a preliminary introduction to the theory of convolutional neural network. The convolutional layer and pooling layer of the convolution core are detailed in the detailed explanation of convolutional neural network in this column.

Guess you like

Origin blog.csdn.net/m0_58807719/article/details/128210501