Deep learning into the pit - pooling and numpy implementation

foreword

      Convolutional neural network (ConvNets or CNNs), as a type of neural network, supports the development of cv. This article mainly introduces another operation of convolutional neural network - pooling operation, its principle, and completes pooling from the perspective of Xiaobai A numpy implementation from 0 to 1.

1

      As a series of Xiaobai's entry into the pit, today's operator pooling begins. If you miss other operators, please pay attention to the invincible Zhang Dadao on the official account to receive it.

      The term pooling begins with the visual mechanism and refers to the merging and integration of resources. It is pooling in English, and it is literally translated into pooling in Chinese. The pooling operation (also known as subsampling or downsampling) is mainly to reduce the dimensionality of each feature map, which can reduce the size of the parameter matrix, thereby reducing the number of final outputs, but retaining the most important information. Pooling can have different types: MaxPooling, AveragePooling, SumPooling, etc.

2

      Taking Max Pooling as an example, we first define a sliding window (such as a 2×2 window) and obtain the largest element from the feature map corresponding to the window as the corresponding position value of the output feature, and then use the step of stride=2 Long, from left to right, from top to bottom to the next position, and as shown in the figure below: The
insert image description here
      overall operation is as follows: a 5x5 feature, sliding with a 3x3 sliding window, each step is 1, and the maximum pooling The result of average pooling is as shown in the figure below. The operations of AveragePooling and SumPooling are basically the same. It AveragePoolingis to calculate the average value of each sliding window. SumPooling calculates the sum of each sliding window. The operation is simple but the effect is obvious, and sumPooling is generally not used in images. , the main reason is that AveragePooling is a form of sumPooling.
insert image description here
insert image description here
      For each feature map, there are n channels. The pooling operation is to operate on each channel, that is, the input is 3 layers of feature maps, and the output is also three layers, as shown in the figure below As shown (here I make up my own brain hole, is Pooling also a kind of depth separable convolution, but the pooling here is static , the feature value is fixed, and does not participate in training).
insert image description here

      The following figure is the change of the feature map after using maxPoolingand SumPooling. It is generally believed that if the regional mean value is selected, the characteristics of the overall data can often be preserved, and the background information can be better highlighted; if the regional maximum value (max pooling) is selected, the texture can be better preserved. Features:
insert image description here
      Of course, there are various other pooling operations:
      In addition, there are some variants such as weighted max pooling, Lp pooling, generalization max pooling, and global pooling.

  • stochastic pooling: Elements are randomly selected according to their probability values. The probability of an element being selected is positively correlated with its value. This is a regularization operation.
  • mixed pooling: Random selection in max/average pooling.
  • Data Driven/Detail-Preserving Pooling: The above methods are all manually designed, but now all fields of deep learning are actually developing in the direction of automation. We have also said before that we have started to study data-driven solutions from activation functions to normalization, and the same is true for pooling. Every picture can learn the most suitable pooling method for itself.

      In this way, the operation of pooling is simple and convenient, and there are many benefits:

  • The size of the feature map can be gradually reduced. When the feature map is smaller, the receptive field corresponding to the original image is larger; as shown in the maxpooling operation diagram, for a 5×5 image, the size of the feature map after the convolution operation is 5 ×5, the size after the pooling operation is 3×3, so for each element, it used to correspond to a pixel, and now it corresponds to the 3×3 original image field of view, and the receptive field increases.
  • At the same time, everyone assumes that after multi-layer convolution operation, the amount of data in the model is actually overfitting. This reduces the parameters of the network and the amount of calculated data, which can effectively prevent overfitting;
  • In addition, in many articles, it is said that pooling has translation invariance, because pooling continuously abstracts the characteristics of the region and does not care about the position, so pooling increases translation invariance to a certain extent, which is similar to the convolution operation here, from another On the one hand, unless the receptive field is large enough, conv, etc. can only learn local information, and lack a global view, so there will be a rise of transformers, etc. on cv. . This is something for later, let's not mention it for now.

3
      The implementation of pooling operators has been packaged in torch, tensorflow and other frameworks, and it is very convenient to use out of the box. This is to facilitate your own understanding. Pooling is realized from 0 through numpy. The idea is as follows. Also consider inheriting the Layers class. For the code of the Layer class, see the implementation of the Layer class in the conv operator. The pooling operator integrates the Layer class, and the forward and reverse implementations are as follows:

import numpy as np
from module import Layers 

class Pooling(Layers):
    """

    """
    def __init__(self, name, ksize, stride, type):
        super(Pooling).__init__(name)
        self.type = type
        self.ksize = ksize
        self.stride = stride 

    def forward(self, x):
        b, c, h, w = x.shape
        out = np.zeros([b, c, h//self.stride, w//self.stride]) 
        self.index = np.zeros_like(x)
        for b in range(b):
            for d in range(c):
                for i in range(h//self.stride):
                    for j in range(w//self.stride):
                        _x = i *self.stride
                        _y = j *self.stride
                        if self.type =="max":
                            out[b, d, i, j] = np.max(x[b, d, _x:_x+self.ksize, _y:_y+self.ksize])
                            index = np.argmax(x[b, d, _x:_x+self.ksize, _y:_y+self.ksize])
                            self.index[b, d, _x +index//self.ksize, _y +index%self.ksize ] = 1
                        elif self.type == "aveg":
                            out[b, d, i, j] = np.mean((x[b, d, _x:_x+self.ksize, _y:_y+self.ksize]))
        return out 

    def backward(self, grad_out):
        if self.type =="max":
            return np.repeat(np.repeat(grad_out, self.stride, axis=2),self.stride, axis=3)* self.index 
        elif self.type =="aveg":
            return np.repeat(np.repeat(grad_out, self.stride, axis=2), self.stride, axis=3)/(self.ksize * self.ksize)

      Since the development of deep learning, various operators have been repeatedly ravaged at the beginning, from true fragrance to abandonment to true fragrance. Eric Kauderer-Abrams of Stanford University once found that pooling seems to have no effect on performance by translating sensitive maps, and data enhancement is to improve performance. good way. . In short, convolution and pooling with a step size are for downsampling, and each has its own advantages and disadvantages. Although the convolution with a step size does not require pooling, it does not have a flexible activation mechanism. Average pooling works steadily after convolution activation, but loses details. Maximum pooling overcomes the shortcomings of average pooling, because only the maximum value is retained each time, and the gradient return is also interrupted. The size of the pooling also needs to be considered.

参考:
[1] https://zhuanlan.zhihu.com/p/58381421
[2]https://www.zhihu.com/question/303215483/answer/615115629

For more CVs, please pay attention to the official account: the invincible Zhang Dadao

Guess you like

Origin blog.csdn.net/zqwwwm/article/details/123603044