Neural network CNN pooling layer

Brief introduction

  Pooling layer is a common operation in CNN. Pooling layer is usually called subsampling or downsampling. When constructing CNN network, it is often used after convolutional layer. The pooling layer reduces the feature dimension of the output of the convolutional layer, which can effectively reduce the network parameters while preventing over-fitting.
  Speaking of pooling operations, we will think of the pooling operations we often use, namely Max Pooling and Average Pooling. There are actually many kinds of pooling operations (for details, refer to the following) . Suppose the input size is iii , the output size isooo,kernel size为 k k k , stride issss , the following formula is satisfied:o = i − ks + 1 o = \frac{ik}{s} + 1The=sik+1 ; If there is a pad, the following formula is satisfied:o = i + 2 × p − ks + 1 o = \frac{i + 2 \times p -k}{s} + 1The=si+2×pk+1 .
The
  main functions of thepooling layer areas follows:
    1. Suppress noise and reduce information redundancy
    2. Improve the scale invariance and rotation invariance
    of the model 3. Reduce the amount of model calculation
    4. Prevent overfitting

Classification of pooling layer

Max/mean pooling

  Maximum pooling is to select the maximum value in the image area as the value after the area is pooled. When the direction propagates, the gradient propagates through the maximum value of the forward propagation process, and the other position gradients are 0.
  When used, the maximum pooling is especially overlapping pooling and non-overlapping pooling, such as the common stride = kernel _ size stride = kernel\_sizestride=k e r n e l _ s i z e belongs to overlapping pooling. Compared with non-overlapping pooling, overlapping pooling can not only improve the prediction accuracy, but colleagues can alleviate overfitting in certain areas.
  An application example of non-overlapping pooling is that the last layer of yolov3-tiny backbone uses a maxpooling with stride=1 and kernel size = 2 for feature extraction.

import torch
import torch.nn.functional as F

input = torch.Tensor(1, 3, 32, 32)
output = F.max_pool2d(input, kernel_size=2, stride=2)
print(output.shape)

output = F.max_pool2d(input, kernel_size=2, stride=2, padding=1)
print(output.shape)

'''
output:
   torch.Size([1, 3, 16, 16])
   torch.Size([1, 3, 17, 17])
'''

  Mean pooling is to use the average value of all values ​​in the selected image area as the pooled value of the area.

import torch
import torch.nn.functional as F

input = torch.Tensor(1, 3, 32, 32)
output = F.avg_pool2d(input, kernel_size=2, stride=2)
print(output.shape)

'''
output:
   torch.Size([1, 3, 16, 16])
'''

Median pooling

  It is similar to the median filter, but has less chance of being used. Median pooling has the characteristics of learning edge and texture structure, and has strong anti-noise ability.

Combination pooling

  Combination pooling is a pooling strategy that uses the advantages of maximum pooling and average pooling at the same time. There are two common combination strategies: Add and Concat. It is often used as a trick for classification tasks. Its function is to enrich the feature layer. Max Pooling pays more attention to local features, while Average Pooling pays more attention to global features.

import torch
import torch.nn.functional as F

def add_avg_max_pool2d(input):
    max_output = F.max_pool2d(input, kernel_size=2, stride=2)
    avg_output = F.avg_pool2d(input, kernel_size=2, stride=2)
    return 0.5 * (max_output + avg_output)

def concat_avg_max_pool2d(input):
    max_output = F.max_pool2d(input, kernel_size=2, stride=2)
    avg_output = F.avg_pool2d(input, kernel_size=2, stride=2)
    return torch.cat([max_output, avg_output], 1)

if __name__ == '__main__':
    input = torch.Tensor(1, 3, 32, 32)
    output = add_avg_max_pool2d(input)
    print("add: " + str(output.shape))
    output = concat_avg_max_pool2d(input)
    print("concat: " + str(output.shape))
'''
output:
   add: torch.Size([1, 3, 16, 16])
   concat: torch.Size([1, 6, 16, 16])
'''

Spatial Pyramid Pooling

Paper address : Spatial Pyramid Pooling in Deep Convolutional-1406.4729
  Spatial Pyramid Pooling is abbreviated as SPP. SPP was proposed in SPPNet, and it was proposed earlier. There is a link to the paper on SPP above; it was originally proposed to solve CNN The limitation on the size of the input image is repeated convolution calculations and fixed output. The network structure of SPP is as follows:
Insert picture description here

SPP network structure
  In short, the feature map is pooled separately with three-scale pyramid layers, and the pooled results are spliced ​​to obtain a fixed-length feature vector, and then it is input into CNN for classification.

  There is a network structure in yolov3 called yolov3-spp.cfg. This network achieves a higher accuracy rate than yolov.cfg. The git URL of yolov3-spp.cfg is as follows:

### SPP ###
[maxpool]
stride=1
size=5

[route]
layers=-2

[maxpool]
stride=1
size=9

[route]
layers=-4

[maxpool]
stride=1
size=13

[route]
layers=-1,-3,-5,-6

### End SPP ###

  The SPP here is equivalent to a variant of the original SPPNetworks. Through the use of Max Pooling of multiple kernel sizes, all feature maps are finally spliced ​​to obtain a new feature combination.
Advantages of SPP:

  • Solve the problem of inconsistent image size
  • Feature extraction from different angles, in aggregation
  • Reflects the robust features of the algorithm to improve accuracy

Global Average/Max Pooling

  The difference between global average pooling and average pooling lies in the word "global". Both global and local are literally used to describe the pooling window area. Local is to take a sub-region of the feature map to average, and then slide this sub-region; global is obviously to average the entire feature map.
  The same is true for global max pooling.

references

https://www.jianshu.com/p/884c2828cd8e
Spatial Pyramid Pooling in Deep Convolutional-1406.4729
https://www.plob.org/article/22160.html

Guess you like

Origin blog.csdn.net/CFH1021/article/details/105989297