1. Basic structure
The general model of CNN can be summarized as: convolutional layer + pooling layer + fully connected layer + activation function
Some relatively large networks such as VGG generally stack CNN as a constituent unit , and multiple internal convolution kernels and pooling can also be stacked . The functions of each part are as follows:
Convolution : Feature Extraction
Pooling : dimensionality reduction and overfitting prevention
Flatten : Flatten the two-dimensional feature data (expand into one-dimensional)
Fully connected layer : aggregated information
Activation function : softmax (multiple classification); sigmoid (two classification)
2. Specific composition
①Convolution operation
Use the convolution kernel to slide over the input, multiply the values at the corresponding positions of the convolution kernel and sum them up. The area where the convolution kernel is located is the receptive field
Receptive field : The projection of the convolution kernel on the input, because it can only see a part at a time, it is equivalent to a partial connection
Ps. Partial connection is relative to full connection
Padding 0 outside is called padding and is used to prevent edge features from being ignored.
The significance of convolution is to convert a larger original image into a smaller output. Each convolution kernel has a corresponding feature map, which extracts the feature extraction of the original image .
Multi-dimensional convolution: For example, the three-dimensional convolution of a color image is equivalent to the two-dimensional convolution of its three color channels.
For more details, please refer to: Convolution Operation and Convolution Kernel DLC https://blog.csdn.net/weixin_37878740/article/details/127916612
②Pooling
Output a value after a certain conversion of the value in a certain area (to replace the value of the entire area). According to different calculation methods, it can be divided into: average pooling and maximum pooling ; pooling is also called " downsampling ".
The functions of pooling are: ①Reduce the amount of parameters
②Prevent overfitting (preserve the original characteristics of the data)
③ Bring displacement invariance to the network (that is, the displacement of a certain range of images does not affect the calculation results, and this property can be eliminated by removing pooling)
③Flatten
Expand the two-dimensional feature map into a one-dimensional constant vector (for sending to the fully connected layer)
④ Fully connected layer
It is used to establish a mapping between feature maps and outputs.