池化操作average pooling、max pooling、SoftPool、Spatial Pyramid Pooling(SPP)

Pooling

(1) Increasing the receptive field The
receptive field is the area size of the original feature map corresponding to one pixel. Assuming that the size of the last layer of the feature map remains unchanged, under a certain convolution setting, if you want to see the original size of 224*224 For images, many convolutional layers are needed. The use of pooling can also achieve the purpose of increasing the receptive field.
(2) Realization of invariance
Among them, the non-deformability includes translation invariance, rotation invariance and scale invariance. Because pooling abstracts the feature map, it obtains the features of a certain area without caring about the specific location. At this time, when the original feature map changes slightly, the final result will not be affected.
(3) Easy to optimize
dimensionality reduction, remove redundant information, compress features, simplify network complexity, no parameters, easy to optimize, reduce the number of parameters in the network, reduce the consumption of computing resources, and effectively control over-fitting .
(4) Realize nonlinearity

Pooling classification

(1) General pooling

  1. Average pooling: Calculate the average value of the image area as the pooled value of the area.
    Retaining the characteristics of the overall data can highlight the background information, and the contribution of activation in the average pooling is equal, which can significantly reduce the strength of the overall regional feature. GAP refers to global average pooling

  2. Max pooling: Choose the maximum value of the image area as the pooled value of the area.
    The backpropagation of the function can be simply understood as passing the gradient back only along the largest number . Therefore, when propagating forward through the pooling layer, the index of the largest element in the pool is usually recorded (sometimes this is also called switches), so that gradient routing is very efficient during backpropagation .
    The features on the texture are preserved, but the distortion may be excessive.
    Discarding most of the activations brings the risk of losing important information

  3. Random pooling: only the elements in the feature map are randomly selected according to their probability value, that is, the probability of being selected with a large element value is also large.
    Although this random row selection has a probability tendency, it is artificially superimposed and cannot It is always guaranteed that a better result can be selected from a certain random probability selection, so there will be worse results. However, the advantage of adding a probability algorithm is that it creates the possibility of producing better results, so in general That said, it is still possible to get better results.

  4. * SoftPool
    replacing the original pooling operation with SoftPool can bring a 1-2% improvement in consistency accuracy.

     基于softmax加权方法来保留输入的基本属性,同时放大更大强度的特征激活
    

    The weight together with the corresponding activation value is used as a nonlinear transformation. The higher activation is more dominant than the lower activation. Because most pooling operations are performed in a high-dimensional feature space, highlighting activations with greater effects is a more balanced approach than simply choosing the maximum value.
    step:

    1. Calculate weight W i W_iWi
      w i = e a i ∑ j ∈ R e a j w_{i}= \frac{e^{a_{i}}}{\sum _{j \in R}e^{a_{j}}} wi=jReajeai
    2. Weighted summation of all activation values ​​in the neighborhood
      a ~ = ∑ i ∈ R wi ∗ ai \tilde{a} = \sum _{i \in R}w_{i}*a_{i}a~=iRwiai
      Insert picture description here
def soft_pool2d(x, kernel_size=2, stride=None, force_inplace=False):
    if x.is_cuda and not force_inplace:
        return CUDA_SOFTPOOL2d.apply(x, kernel_size, stride)
    kernel_size = _pair(kernel_size)
    if stride is None:
        stride = kernel_size
    else:
        stride = _pair(stride)
    # Get input sizes
    _, c, h, w = x.size()
    # Create per-element exponential value sum : Tensor [b x 1 x h x w]
    e_x = torch.sum(torch.exp(x),dim=1,keepdim=True)
    # Apply mask to input and pool and calculate the exponential sum
    # Tensor: [b x c x h x w] -> [b x c x h' x w']
    return F.avg_pool2d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool2d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))

Paper https://arxiv.org/abs/2101.00440
Project address https://github.com/alexandrosstergiou/SoftPool

(2) Overlapping pooling
Compared with the traditional no-overlapping pooling, the use of Overlapping Pooling can not only improve the prediction accuracy, but also slow down over-fitting to a certain extent.
  Compared with normal pooling (step size s=2, window z=2), overlapping pooling (step size s=2, window z=3) can reduce top-1 and top-5 by 0.4% and 0.3% respectively; Overlapping pooling can avoid overfitting to a certain extent.
  
(3) * Spatial Pyramid Pooling

In a general CNN structure, a full connection is usually connected behind the convolutional layer. The number of features of the fully connected layer is fixed, so when the network is input, the input size (fixed-size) is fixed. But in reality, the size of our input image cannot always meet the size required for input. However, the usual methods are crop and warp. The aspect ratio of the image (ratio aspect) and the size of the input image are changed. This will distort the original image.
The SPP uses multi-scale pooling, then reshape and splicing to obtain a fixed size feature vector
Insert picture description here

Proof
Axiom: Any number can be written as the sum of squares of several numbers.
a = a 1 2 + a 2 2… a=a_{1}^{2}+a_{2}^{2}\cdotsa=a12+a22

First assume that the size of the fixed input image s = 224 s=224s=2 2 4 , and this network convolutional layer finally outputs 256 layers of feature-maps, and the size of each feature-map is13 × 13 (a = 13) 13 × 13 (a=13)13×13(a=1 3 ) , a total of256 × (9 + 4 + 1) 256 × (9+4+1) in thefully connected layer256×(9+4+1 ) neurons, that is, the input size of the fully connected layer is256 × (9 + 4 + 1) 256 × (9+4+1)256×(9+4+1 ) . (Axiom)
That is, we need to find a number in each feature-map asf = 9 + 4 + 1 f=9+4+1f=9+4+1 feature.
3 pooling windowsw × ww×w are used herew×w , and the corresponding pooling stride isttt , after these 3 more windows pooling, we get 3n × n, n = 3, 2, 1 n×n, n=3,2,1n×n,n=3,2,1 result.
At this time, it is only necessary to calculate the window size ww of pooling according to the 3 n obtained by decomposition of the size of the feature map and the dimension of the fully connected layerw and stridettt即可。
w = ⌈ a / n ⌉ t = ⌊ a / n ⌋ \begin{matrix}w=⌈a/n⌉\\\\ t=⌊a/n⌋\\ \end{matrix} w=a/nt=a/n
Insert picture description here

SPP notable features

  1. Regardless of the input size, SPP can produce a fixed size output

  2. Use multiple windows (pooling window)

  3. SPP can use the same image with different scales as input to obtain pooled features of the same length.

  4. Due to the different aspect ratios and different sizes of the input image, SPP can also handle it, so the scale-invariance of the image is improved and the over-fitting is reduced.

  5. Experiments show that the diversity of training image sizes makes it easier for the network to converge than training images of a single size (convergence)

  6. SPP is independent of specific CNN network design and structure. (In other words, as long as the SPP is placed behind the last convolutional layer, it has no effect on the structure of the network, it just replaces the original pooling layer)

  7. It can be used not only for image classification but also for target detection

  8. Multi-window pooling will improve the accuracy of the experiment

  9. Entering different sizes of the same image will improve the accuracy of the experiment (from the perspective of the scale space, the scale invariance is improved)

  10. Use multi-view (multi-view) to test, also improve the test results

  11. The size of the image input has an impact on the results of the experiment (because the target feature area is large or small)

  12. Because we are replacing the Poooling layer of the network, it has no effect on the entire network structure, so the entire network can be trained normally.

from math import floor, ceil
import torch
import torch.nn as nn
import torch.nn.functional as F

class SpatialPyramidPooling2d(nn.Module):
    r"""apply spatial pyramid pooling over a 4d input(a mini-batch of 2d inputs
    with additional channel dimension) as described in the paper
    'Spatial Pyramid Pooling in deep convolutional Networks for visual recognition'
    Args:
        num_level:
        pool_type: max_pool, avg_pool, Default:max_pool
    By the way, the target output size is num_grid:
        num_grid = 0
        for i in range num_level:
            num_grid += (i + 1) * (i + 1)
        num_grid = num_grid * channels # channels is the channel dimension of input data
    examples:
        >>> input = torch.randn((1,3,32,32), dtype=torch.float32)
        >>> net = torch.nn.Sequential(nn.Conv2d(in_channels=3,out_channels=32,kernel_size=3,stride=1),\
                                      nn.ReLU(),\
                                      SpatialPyramidPooling2d(num_level=2,pool_type='avg_pool'),\
                                      nn.Linear(32 * (1*1 + 2*2), 10))
        >>> output = net(input)
    """

    def __init__(self, num_level, pool_type='max_pool'):
        super(SpatialPyramidPooling2d, self).__init__()
        self.num_level = num_level
        self.pool_type = pool_type

    def forward(self, x):
        N, C, H, W = x.size()
        for i in range(self.num_level):
            level = i + 1
            kernel_size = (ceil(H / level), ceil(W / level))
            stride = (ceil(H / level), ceil(W / level))
            padding = (floor((kernel_size[0] * level - H + 1) / 2), floor((kernel_size[1] * level - W + 1) / 2))

            if self.pool_type == 'max_pool':
                tensor = (F.max_pool2d(x, kernel_size=kernel_size, stride=stride, padding=padding)).view(N, -1)
            else:
                tensor = (F.avg_pool2d(x, kernel_size=kernel_size, stride=stride, padding=padding)).view(N, -1)

            if i == 0:
                res = tensor
            else:
                res = torch.cat((res, tensor), 1)
        return res
    def __repr__(self):
        return self.__class__.__name__ + '(' \
            + 'num_level = ' + str(self.num_level) \
            + ', pool_type = ' + str(self.pool_type) + ')'



https://www.cnblogs.com/qinduanyinghua/p/9016235.html
https://blog.csdn.net/yzf0011/article/details/75212513
https://zhuanlan.zhihu.com/p/343481363?utm_source=wechat_session&utm_medium=social&utm_oi=1054735681825386496&utm_campaign=shareopn

Guess you like

Origin blog.csdn.net/weixin_42764932/article/details/112515715