Neural Network: Pooling Layer Knowledge Points

1. The role of pooling in CNN

The role of the pooling layer is to select features within the receptive field and extract the most representative features in the area, which can effectively reduce the number of output features and thereby reduce the amount of model parameters. According to the operation type, it is usually divided into Max Pooling, Average Pooling and Sum Pooling. They respectively extract the maximum, average and sum eigenvalues ​​in the receptive field as output. The most commonly used ones are are max pooling and average pooling.

2. The role of global pooling

Global pooling mainly includes global average pooling and global maximum pooling.

global max pooling

global average pooling

Next, Rocky takes global average pooling as an example to describe how it works in deep learning networks.

As mentioned just now, global average pooling is to calculate the average value of the entire feature map for each channel on the feature map of the last layer of convolution. As shown below:

global average pooling

Generally, several fully connected layers will be connected at the end of the network, but the feature map after global pooling is equivalent to one pixel, so the final fully connected layer actually becomes a weighted addition operation. This structure is more intuitive than direct full connection, the number of parameters is greatly reduced, and the generalization performance is better:

The role of global pooling:

1. Reduce information redundancy :

  • The pooling layer helps to extract the primary information in the input feature map while suppressing the secondary information. This operation allows the model to focus more on important features and reduce redundant or irrelevant features, which is beneficial to the training and generalization capabilities of the model.

2. Feature dimensionality reduction and downsampling :

  • The pooling operation causes the size of the output feature map to be reduced, achieving the effects of feature dimensionality reduction and downsampling. This helps reduce the amount of computation and improves the range of perception of image features in subsequent layers, so that one pooled pixel corresponds to an area in the previous image.

3. Feature compression and network simplification :

  • The pooling layer can compress the feature map, reduce the consumption of computing resources, simplify the network structure, reduce the complexity of the model, help prevent over-fitting, and improve the generalization ability of the model.

4. Improve the invariance of the model :

  • Pooling operations help improve the model's invariance to scale, rotation, and translation. After pooling, when the size or rotation angle of the input feature map changes, the size and rotation angle of the output feature map remain unchanged. This invariance helps improve the generalization ability and robustness of the model.

5. Achieve non-linearity.

3. Classification of pooling

A. General Pooling:

In CNN, the pooling layer is used to reduce the spatial size of the feature map to reduce the amount of calculation and reduce the possibility of overfitting. There are two most common pooling operations:

Average Pooling:
  • Calculate the average of the image area as the pooled value of that area.
  • It can suppress the phenomenon that the variance of estimated values ​​increases due to the limited size in the neighborhood.
  • Its characteristic is that it has better background preservation effect.
Max Pooling:
  • Select the maximum value of the image area as the pooled value of the area.
  • It can suppress the phenomenon of estimated mean deviation caused by network parameter errors.
  • Its characteristic is better extraction of texture information.
Stochastic Pooling:
  • Local values ​​are sampled according to probability, and the sampling result is the pooling result.

B. Overlapping Pooling:

In some cases, there can be overlapping areas between adjacent pooling windows. In this case, the size of the pooling window is generally set to be larger than the stride (stride).

The characteristic of overlapping pooling is that it can capture image features more fully than conventional pooling operations, but it may also lead to an increase in computational complexity.

These pooling methods are commonly used technical methods in CNN to reduce the data size and parameter amount while retaining important information, thereby improving the performance and generalization ability of the model.

4. Advanced use of pooling---Introduction to SPP structure

Paper name: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Download address: https://arxiv.org/abs/1406.4729

The introduction of the Spatial Pyramid Pooling (SPP) layer solves the limitation of fixed input image size in traditional convolutional neural networks (CNN). Traditional fully connected layers require fixed-size feature vectors for input, which means that all input images need to be of the same size and usually need to be cropped or stretched, resulting in image distortion. The SPP layer allows the network to accept input images of different sizes, integrate and extract features of feature maps of different sizes through pyramid-shaped pooling areas. Its function is to convert feature maps of different sizes into fixed-size feature vectors, so that all inputs have the same size before connecting to the fully connected layer, without the need to process the image in advance. This flexibility improves the applicability and generalization capabilities of the network, allowing the model to handle inputs of various sizes more flexibly.

Insert image description here
Insert image description here

The salient features of SPP (Spatial Pyramid Pooling) are:

Fixed-size output: SPP is able to produce fixed-size output regardless of the input size, overcoming the limitation of fully connected layers requiring fixed-length inputs.

Pooling of multiple windows: SPP uses pooling of multiple windows, allowing it to extract features at different scales.

Scale invariance and feature consistency: It can handle input images of different aspect ratios and sizes, enhancing the scale invariance of the model and reducing the risk of overfitting.

Other features include:

Diversity of training images makes it easier for the network to converge: SPP allows the use of images of different sizes for training. Compared with training images of a single size, this diversity of training is more conducive to network convergence.

Independent of specific network design and structure: SPP can be used as the last layer of a convolutional neural network without affecting the network structure and only replaces the original pooling layer.

Suitable for image classification and target detection: SPP is not only suitable for image classification, but also can be used for tasks such as target detection, expanding its application fields.

These characteristics of SPP make it a powerful tool that maintains the output of fixed-length feature vectors when processing images of different sizes and aspect ratios, improving the flexibility and generalization capabilities of the model.

Guess you like

Origin blog.csdn.net/weixin_51390582/article/details/135109045