论文笔记-Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

paper: Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

code: https://github.com/Andrew-Qibin/SPNet

Abstract

  1. An article in CVPR2020 optimizes the task of scene segmentation by improving the spatial pooling layer. The starting point is that the traditional standard pooling is mostly square, and there will be some objects in the actual scene that are long, and the model needs to capture a long-range dependency as much as possible. Therefore, the author introduced a long but narrow kernel .
  2. The main concerns of this article are:
    • Introduce strip pooling , the long but narrow kernel mentioned earlier
    • Based on the above, the strip pooling module (SPM) is constructed to make the structure plug-and-play in the existing network structure
    • Further combining strip pooling and standard spatial pooling, a mixed pooling module is proposed , that is, a comprehensive standard of spatial pooling and strip pooling to take into account the segmentation of objects of various shapes
    • Based on all the improvements above, SPNet is proposed to verify the effectiveness of the previous points

Details

  1. How to implement strip pooling?

    • Many of the actual objects depend on long-range dependencies, so the most direct way to solve this problem from the perspective of pooling should be to change the pooling operation to a long but narrow shape, that is, strip pooling in this article
    • The implementation is very simple, which is equivalent to setting the width or height of the standard spatial pooling kernel to 1, and then taking all horizontal elements or vertical elements to add each time to average (average pooling, according to the formula in the text, each time The number of all horizontal elements or vertical elements of the input is w and h of the input tensor respectively, that is, only the sampling method in spatial pooling is changed, corresponding to equations 2 and 3 in the paper
    • In the PyTorch source code published by the author, the author realized it through AdaptiveAvgPooling
      • PyTorch's AdaptiveAvgPooling operation is to specify the input and output size by the user. The operation will calculate the corresponding stride and kernel size internally. You can refer to here
    • The effect comparison of strip pooling and spatial pooling can be seen in the figure below
  2. strip pooling module

    • In order to enable strip pooling to be plug-and-play in different existing network structures, the author designed the strip pooling module, which encapsulates the strip pooling inside the module to ensure that the input feature map has been executed after passing through the SPM module Feature map of strip pooling operation
    • As shown below:
      • For an input tensor, use two paths to handle horizontal and vertical strip pooling, and then expand to the original size of the input tensor (see the code below, this expand should be interprolate through interpolation of upsampling )
      • Then add the results of the two pathways for fusion; then add one 1x1 conv(change the number of channels), and then add the sigmoid activation function
      • At the same time, there is a similar identity map operation in SPM. The result of the sigmoid after the above two routes are fused directly through element-wise multiplication. Here is equivalent to the sigmoid obtained in the previous step is a weight matrix to get the importance of the features of each position in the input tensor. Therefore, the above 2-way pathway can actually be retrained, a bit like the attention mechanism
  3. mixed pooling module

    • If all pooling in the network is replaced with strip pooling operations because of the above considerations, it will inevitably affect the effect of the original non-strip objects, and it is not worth the loss. Therefore, the author added both strip pooling and pyramid pooling to construct a mixed pooling module
    • Among them, strip pooling is used to solve long-range dependencies, while lightweight pyramid pooling is used to solve short-range dependencies
  4. Implementation code

    • SPNet part , its backbone is resnet series
    • Reference source StripPooling part
      ### 通过AdaptiveAvgPool2d实现strip pooling
      self.pool1 = nn.AdaptiveAvgPool2d(pool_size[0])
      self.pool2 = nn.AdaptiveAvgPool2d(pool_size[1])
      self.pool3 = nn.AdaptiveAvgPool2d((1, None))
      self.pool4 = nn.AdaptiveAvgPool2d((None, 1))
      
      ## SPM模块
      def forward(self, x):
          _, _, h, w = x.size()
          x1 = self.conv1_1(x)
          x2 = self.conv1_2(x)
          x2_1 = self.conv2_0(x1)
          x2_2 = F.interpolate(self.conv2_1(self.pool1(x1)), (h, w), **self._up_kwargs)
          x2_3 = F.interpolate(self.conv2_2(self.pool2(x1)), (h, w), **self._up_kwargs)
          x2_4 = F.interpolate(self.conv2_3(self.pool3(x2)), (h, w), **self._up_kwargs)
          x2_5 = F.interpolate(self.conv2_4(self.pool4(x2)), (h, w), **self._up_kwargs)
          x1 = self.conv2_5(F.relu_(x2_1 + x2_2 + x2_3))
          x2 = self.conv2_6(F.relu_(x2_5 + x2_4))
          out = self.conv3(torch.cat([x1, x2], dim=1))
          return F.relu_(x + out)
      

Conclusions

  1. Personally feel that the idea of ​​strip pooling is still more reasonable, and the implementation is relatively simple. For traffic scenes, roadblocks, lane lines and other objects, it should be helpful.
  2. Furthermore, if the idea of ​​strip pooling is combined with asymmetric convolution, is it possible to further improve the effect of objects requiring long-range dependencies described in this article?

Guess you like

Origin www.cnblogs.com/xiangs/p/12747816.html