Intensive reading of deep learning papers [11]: Deeplab v2

16267c37531c8e8c06a6e79bedab687d.jpeg

Deeplab v2 is strictly a small update of Deeplab v1. Based on the hole convolution and CRF of v1, it focuses on the applicability of the network to multi-scale problems. The multi-scale problem has always been one of the important challenges of target detection and semantic segmentation tasks. In the past, the common practice of achieving multi-scale is to scale the same image to different sizes to obtain the corresponding convolutional feature maps, and then use different sizes of feature maps. Multi-scale information is obtained by upsampling and fusion respectively, but the biggest disadvantage of this approach is that the computational overhead is too large. Deeplab v2 draws on the idea of ​​Spatial Pyramaid Pooling (SPP) and proposes Atrous Spatial Pyramaid Pooling (ASPP) based on hole convolution, which is also the biggest highlight of Deeplab v2. The paper that proposed Deeplab v2 is DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, which is an important representative of the previous structure of the Deeplab series of networks.

ASPP is derived from the SPP structure in the field of R-CNN (Regional CNN) target detection. This method shows that image regions of any scale can be accurately and effectively classified by resampling convolutional features extracted from a single scale. On the basis of ASPP, ordinary convolution is changed to hole convolution, feature extraction is performed by using multiple parallel hole convolutions with different expansion rates, and finally each branch is fused. The ASPP structure is shown in the figure below.

469443c8e738e5a423062a7fcd5a78bf.png

In addition to ASPP, Deeplab v2 also replaced the backbone network of VGG-16 in v1 with ResNet-101, which is an upgrade to the encoder, enabling it to have stronger feature extraction capabilities. Deeplab v2 achieved state-of-the-art SOTA results on semantic segmentation datasets such as PASCAL VOC and Cityscapes. A simple implementation of the ASPP module refers to the following code. First, the convolution and pooling methods of ASPP are defined separately, and then the ASPP module is defined on the basis of it.

### 定义ASPP卷积方法
class ASPPConv(nn.Sequential):
    def __init__(self, in_channels, out_channels, dilation):
        modules = [
            nn.Conv2d(in_channels, out_channels, 3, padding=dilation,
                      dilation=dilation, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        ]
        super(ASPPConv, self).__init__(*modules)


  ### 定义ASPP池化方法
class ASPPPooling(nn.Sequential):
    def __init__(self, in_channels, out_channels):
        super(ASPPPooling, self).__init__(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(in_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True))


    def forward(self, x):
        size = x.shape[-2:]
        x = super(ASPPPooling, self).forward(x)
        return F.interpolate(x, size=size, mode='bilinear',
 align_corners=False)


### 定义ASPP模块
class ASPP(nn.Module):
    def __init__(self, in_channels, atrous_rates):
        super(ASPP, self).__init__()
        out_channels = 256
        modules = []
        modules.append(nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)))


        rate1, rate2, rate3 = tuple(atrous_rates)
        modules.append(ASPPConv(in_channels, out_channels, rate1))
        modules.append(ASPPConv(in_channels, out_channels, rate2))
        modules.append(ASPPConv(in_channels, out_channels, rate3))
        modules.append(ASPPPooling(in_channels, out_channels))


        self.convs = nn.ModuleList(modules)


        self.project = nn.Sequential(
            nn.Conv2d(5 * out_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1),)


  # ASPP前向计算流程
    def forward(self, x):
        res = []
        for conv in self.convs:
            res.append(conv(x))
        res = torch.cat(res, dim=1)
        return self.project(res)

The following figure shows the segmentation effect of Deeplab v2 on the Cityscapes dataset:

214cc2323f3b90b28995f9e6a7e9d00f.png

Past highlights:

 Intensive reading of deep learning papers [10]: Deeplab v1

 Intensive reading of deep learning papers [9]: PSPNet

 Intensive reading of deep learning papers [8]: ParseNet

 Intensive reading of deep learning papers [7]: nnUNet

 Intensive reading of deep learning papers [6]: UNet++

 Intensive reading of deep learning papers [5]: Attention UNet

 Intensive reading of deep learning papers [4]: ​​RefineNet

 Intensive reading of deep learning papers [3]: SegNet

 Intensive reading of deep learning papers [2]: UNet network

 Intensive reading of deep learning papers [1]: FCN full convolutional network

Guess you like

Origin blog.csdn.net/weixin_37737254/article/details/126277219