Summary of some definitions of convolutional neural network (CNN): convolution, pooling, filter, kernel, feature map, input layer, hidden layer... (may not be comprehensive, will be added later)

1. Basic concepts

1. Convolution

Convolution is a physical and mathematical concept. It can be understood that the output of the system at a certain moment is the result of multiple inputs acting together (superposition).
The convolution formula is as follows:
Convolution formula

For a detailed explanation, you can see [From "convolution", to "image convolution operation", and then to "convolutional neural network", three changes in the meaning of "convolution"]https://www.bilibili.com/video/ BV1VV411478E?vd_source=6f69eb2b361d7f319fa5f5250e9a5d4a

In essence, convolution is to "filter" (filter) information (signal), which can "filter out" the information that we are interested in and useful to us .

The convolution here is different from the concept of physical mathematics. In the convolution operation of the image, f(x) can be understood as the original pixel (source pixel), and all the original pixels are superimposed to form the original image. g(x) can be called the action point, and all the action points together are called the convolution kernel (Convolution kernel). In fact, convolution is to replace the pixel value of a point with the weighted average of the pixel values ​​​​of the points around it. (still a linear operation), the final result we call the destination pixel.

Please add a picture description
What is the role of the convolution kernel (also called a filter)?
(1) The original image can extract some specified features of the image through mathematical operations with the convolution kernel.

(2) Different convolution kernels have different extracted features.

(3) The extracted features are the same, and different convolution kernels have different effects.
**PS:**The convolution operation is a Hadamard product of a matrix, not a product commonly used in linear algebra.

2. Pooling

Pooling also becomes downsampling. Generally, after the convolution process, its essence is sampling , which compresses the input feature map (the result of the convolution operation). On the one hand, it reduces the features, resulting in a reduction in parameters, which in turn simplifies the convolution. The complexity of network calculation; on the other hand, it maintains some invariance of features (rotation, translation, scaling, etc.). Another feature
of pooling is that it does not affect the interaction between channels , while the convolutional layer can interact between channels and then generate new channels in the next layer.

Its purpose is feature dimensionality reduction, further information abstraction and feature extraction, reducing the consumption of computing resources, reducing image resolution, and preventing model overfitting. The most common methods are maximum (Max) pooling, minimum (Min) pooling, and average (Average) pooling.
In short, pooling is to remove redundant information and retain key information

3. Filter (filter) & kernel (kernel)

**Kernel (kernel)** is a two-dimensional matrix, length * width
** filter (filter) ** is also called convolution kernel, filter. is a three-dimensional cube, length × width × depth, where the depth is how many kernels it consists of, and is maintained with the depth of the input layer. The size of this matrix is ​​also called the receptive field.
The relationship between the two : it can be said that the kernel is the basic element of the filter, and multiple kernels form a filter;

4. Feature map (channel)

**Feature map (feature map)** is the output of the calculation and processing of the input, which can be understood as the three channels of the RGB image. After the convolution operation, a new matrix is ​​generated which is a feature map. After the feature map, the convolution is performed. The product operation generates the feature map again.

2. Hierarchical structure of convolutional neural network

  • Input Layer
  • Convolutional Layer
  • Activation layer (Relu Layer)
  • Pooling Layer
  • Fully connected layer (FC Layer)

1. Input layer

The input layer is the input image of the entire convolutional neural network. Each image has a number of pixels, and each pixel corresponds to a pixel value, so the image can also be regarded as a matrix of pixel values. The input image of the convolutional neural network generally has two forms, namely color image and grayscale image, here also has the concept of input channel (this is the depth of convolutional layer), the color image is RGB channel, and the input channel is 3 , while the grayscale image input channel is 1. A color image is an image in RGB color mode, and each pixel in the image corresponds to three values, so a color image corresponds to three pixel value matrices. Each pixel in a grayscale image corresponds to a grayscale value, so a grayscale image corresponds to a matrix of pixel values.

2. Convolution layer

The convolutional layer is a layer of convolution operation, which can keep the shape unchanged. When the input data is an image, the convolutional layer will receive the input data in the form of 3D data and output it to the next in the form of 3D data. layer. Therefore, in CNN, data with shapes such as images can be correctly understood.
The biggest feature of the convolutional layer: local perception and weight sharing.

3. Pooling layer

The pooling layer is the process of compressing the input data and extracting the main features, which will not affect other channels, so no action channels will be generated.

4. Hidden layer

The hidden layer is actually a simple concept. In a neural network, all layers except the input layer and output layer are hidden layers. This is because in the neural network, the layers other than the input layer and the output layer are like a black box, which cannot be seen, and the layers inside are hidden in it.

5. Fully connected layer

The fully connected layer is that each node is connected to all nodes in the previous layer, which is used to integrate the features extracted earlier. Due to its fully connected nature.
The fully connected layer is to expand the feature map (matrix) obtained by the last layer of convolution into a one-dimensional vector, and provide input for the classifier. Generally, the parameters of the fully connected layer are also the most. Simply put, the role of the fully connected layer is to feature Graph classification.
If the fully connected layer is used as the last layer, plus softmax or wx+b, it can be used as a classification or regression respectively, that is, the role of "classifier" or "regression device"); if it is used as the second to last layer, 3 layers If , the role of the fully connected layer is to fuse information and enhance information expression.

6. Output layer

After the output of the fully connected layer is activated by the activation function, it is output

7. Other functional layers

  • normalization layer
  • pooling layer
  • cut layer
  • Fusion layer
    These layers have not been met yet, and will be added when needed

3. Some other concepts

1. The location of the activation function

Generally speaking, the location of the activation function is on the connection line of different layers of the neural network, or it can be said to be on the connection line between two neurons.
Generally speaking, the activation function will be behind the convolutional layer, but this is not necessary. In the case of linear inseparability, an activation function is required. When linearly separable, there is no activation function.

2. Local connection (local connection)

The local connection is opposite to the full connection. The neurons of the latter layer are not fully connected with the neurons of the previous layer, but some neurons are connected, which are only used to learn local features. Please add a picture description
What are the benefits of doing this?
It is obvious that the learning parameters are reduced, the redundant information of the neural network is reduced, the computing power of the device is also saved, and the learning rate can be accelerated to prevent overfitting to a certain extent. So will this result in information loss? Yes, this will cause the image to become blurred and lose detail, but it also prevents overfitting, removes noise, and improves performance.

3. Weight sharing (parameter sharing)

Weight sharing means, given an input image, use a convolution kernel to scan the image, the number in the convolution kernel is called the weight, and each position of this image is scanned by the same convolution kernel, so The weights are the same, that is, shared. (from Popular Science China )

4. Hyperparameters

Hyperparameters are parameters. In fact, the parameters are artificially set before the machine learning starts. The process of machine learning is to optimize the hyperparameters.

5. Edge detection

To put it simply, edge detection is a special case of convolution operation. Its main function is to use different convolution kernels and feature maps to perform convolution operations to extract the edge features of feature maps.

Disclaimer:
This blog is some excerpts of notes and feelings during personal study. It is not guaranteed to be original. The content has collected relevant information and secretary content on the Internet. If there is any infringement, please contact the blogger.

Guess you like

Origin blog.csdn.net/w2190623446/article/details/128730745