ANN:Asymmetric Non-local Neural Networks for Semantic Segmentation

code pytorch

Original Non-local Block structure diagram

Insert picture description here

1.Abstract

Problems with standard non-local:

  1. Too much calculation
  2. GPU memory usage is too high

The author proposes an asymmetric non-local neural network for semantic segmentation, which has two prominent components: an asymmetric pyramid non-local block (APNB: greatly reduces the amount of calculation and memory consumption) and an asymmetric fusion non-local block ( AFNB).

2.Introduction

Insert picture description here
Previous research has shown that
if you make full use of remote dependencies, you can improve performance.

For standard non-local blocks, as long as the output of the key branch and the value branch remains the same size, the output size of the non-local block will remain unchanged. With this in mind, if we can only sample a few representative points from the key branch and the value branch, it is possible to greatly reduce the time complexity without sacrificing performance. So change N in the picture to S (S << N).

3.Asymmetric Non-local Neural Network

Insert picture description here
Insert picture description here

3.1 Revisiting Non-local Block
  1. Input feature X∈R C × H × W , use three 1 × 1 convolutions Wφ, Wθ and Wγ to transform X into φ∈R Cˆ × H × W ,
    θ∈R Cˆ × H × W and γ∈R Cˆ × H × W
    Insert picture description here
  2. Flattening is the size of Cˆ × N, where N represents the total number of spatial positions, that is, N = H · W. Find the similarity matrix
    V∈R N × N
    Insert picture description here
  3. Normalize V. The normalization function f can take the form of softmax, rescaling, and none.
    Insert picture description here
  4. For each position in γ, the output of the attention layer is
    Insert picture description here
    Insert picture description here
  5. The final output is
    Insert picture description here
    where Wo, also implemented by 1 × 1 convolution, is used as a weighting parameter, the original input X, and the channel size is restored from Cˆ to C.
3.2. Asymmetric Pyramid Non-local Block

Non-local networks effectively capture the remote dependencies that are critical to semantic segmentation. Standard non-local operations are very time-consuming and take up memory. Obviously, large matrix multiplication is the main reason for the inefficiency of non-local blocks.

We will change to a different number N S (S << N), the output will remain the same size, i.e.,
Insert picture description here
Insert picture description here
change to a fractional N S equivalent to the number of sampling points from the representative gamma] and θ, instead of selecting all spatial points ,As shown in Figure 1. Therefore, the computational complexity can be greatly reduced

specific description:

  1. We add sampling modules Pθ and Pγ after θ and γ to sample several sparse anchor points, which are denoted as
    θP∈R Cˆ × S and γP∈R Cˆ × S , respectively, where S is the number of sampled anchor points.
    Insert picture description here

  2. Calculate the similarity matrix VP between φ and anchor point θP:
    Insert picture description here
    Note that VP is an asymmetric matrix of size N × S. Then, VP obtains a unified similarity matrix through the same normalization function as the standard non-local block Insert picture description here.

  3. Attention output:
    Insert picture description here
    Insert picture description here
    This asymmetric matrix multiplication will reduce the time complexity. However, it is difficult to ensure that when S is small, performance does not decrease too much at the same time.
    In order to solve the above problems, we embed pyramid pools in non-local blocks to enhance the global representation while reducing the computational overhead.

Insert picture description here
By doing so, we now come to the final formula of the asymmetric pyramid non-local block (APNB), as shown in Figure 3. An important change is to add a spatial pyramid pool module after θ and γ to sample anchors. The sampling process is clearly described in Figure 4, where several merge layers are applied after θ or γ, and then the four merge results are flattened and connected to be used as the input of the next layer.

We represent the spatial pyramid pooling module as Insert picture description hereand Insert picture description here, where the superscript n represents the width (or height) of the output size of the pooling layer (experimentally, the width is equal to the height). In our model, we set n⊆ {1, 3, 6, 8}. Then the total number of anchor points is that the
Insert picture description here
spatial pyramid pool provides enough feature statistical information about the semantic clues of the global scene to correct the potential performance degradation due to the reduced calculation.

3.3. Asymmetric Fusion Non-local Block

The standard non-local block has only one input source, while the FNB (Fusion Non-local Block) has two input sources: the high-level feature graph Xh ∈ R Ch × Nh and the low-level feature graph Xl ∈ R Cl × Nl .
Similarly, the 1 × 1 convolution Insert picture description heresum is Insert picture description hereused to transform Xh and Xl to
Insert picture description here
Insert picture description here
then, the matrix of similarity between φh and θl is calculated by matrix multiplication, and Insert picture description here
then VF is normalized to obtain a unified similarity matrix Insert picture description here
.
Insert picture description here

3.4. Network Architecture

As our backbone network, ResNet-101 removes the last two downsampling operations and uses dilated convolution to save the feature maps in the last two stages of the input image. We use AFNB to integrate the functions of Stage4 and Stage5. Subsequently, the fused features are associated with the feature map after Stage5 to avoid the situation that AFNB cannot produce accurate enhanced features.

Published 12 original articles · praised 4 · visits 1266

Guess you like

Origin blog.csdn.net/qq_36321330/article/details/105461380