[OpenMMLab AI Combat Camp Phase II] Semantic Segmentation and MMSegmentation

MMSegmentation

Open source code warehouse: https://github.com/open-mmlab/mmsegmentation

Rich algorithms: 600+ pre-training models, 40+ algorithm reproductions

Modular design: easy to configure and easy to expand

Unified hyperparameters: a large number of ablation experiments, supporting fair comparison

Ease of use: training tools, debugging tools, inference API

insert image description here

semantic segmentation

The basic idea

split by color

The color inside the object is similar, and the color changes when the object is handed over

Based on image processing methods, segmented by color

pixel by pixel

Advantages: Can make full use of existing image classification models

Problem: Inefficiency, overlapping convolutions are repeatedly calculated

Upsampling of prediction graphs

question:

The image classification model uses the downsampling layer (step size convolution or pooling) to obtain high-level features, resulting in the full volume and network output size being smaller than the original image, while segmentation requires the same size output

Solution:

Upsampling the predicted segmentation image, restoring the resolution of the original image, and the upsampling scheme:

  1. bilinear interpolation

  2. Transposed Convolutions: Learnable Upsampling Layers

Upsampling based on multi-layer features

Problem: Based on the top-level feature prediction, the prediction map obtained by upsampling 32 times is relatively rough

Analysis: After multiple downsampling of high-level features, the details are seriously lost

Solution idea: combine low-level and high-level feature maps

Solution FCN:

Generate category predictions based on low-level and high-level feature maps, upsample to the size of the original image, and then average to obtain the final result

context information

Original image-backbone network-feature map-prediction map

PSPNET

Original image-feature map-multi-scale pooling-feature splicing-category prediction

Redize the feature map at different scales to obtain contextual features at different scales

The contextual features are stitched back to the original feature map after channel compression and spatial upsampling -> contain both local and contextual features

Generating prediction maps based on fused features

DeepLab series

DeepLab is another series of work on semantic segmentation

Main contributions:

  • Using Atrous Convolutions to Solve Downsampling in Networks

  • Use conditional random field CRF as a post-processing method to refine the segmentation map

  • Capturing context information using multi-scale atrous convolution (ASPP module)

DeepLab V1 was published in 2014, and V2, V3, and V3+ versions were proposed in 2016, 2017, and 2018

Atrous convolution solves downsampling problem

Downsampling layer of image classification model makes output size smaller

If the steps in the pooling layer and convolutional layer are removed:

  • The number of downsampling can be reduced

  • The feature map will become larger, and the convolution kernel needs to be increased accordingly to maintain the same receptive field, but a large number of parameters will be added

  • Use Dilated Convolution/Atrous Convolution to increase the receptive field without increasing parameters

Standard convolution:

Feature map-downsampling-convolution operation-convolution kernel-result

Atrous convolution:

The feature map is unchanged and the convolution kernel is expanded and then the convolution operation is performed - the expansion convolution kernel does not generate additional parameters - the same calculation result downsampling plus standard convolution is equivalent to empty convolution

Atrous convolution and downsampling

The feature map obtained by using the upsampling scheme only has a response of 1/4 of the original image, and interpolation is required

Feature maps of the same resolution can be obtained using dilated convolutions without additional interpolation operations

DeepLab model

DeepLab has made modifications based on the image classification network:

  • Remove the second half of the downsampling layer in the classification model

  • The subsequent convolutional layer is changed to expansion convolution, and the rate is gradually increased to maintain the receptive field of the source network

Conditional Random Field (CRF)

The segmentation map directly output by the model is relatively rough, especially at the object boundary, which cannot produce good segmentation results

DeepLab V1&V2 uses Conditional Random Field (CRF) as a post-processing method, combining the original image color and the predicted category of the neural network to obtain refined segmentation results

CRF is a probabilistic model. DeepLab uses CRF to model the segmentation results, and uses the energy function to represent the quality of the segmentation results. By minimizing the energy function, better segmentation results can be obtained.

Spatial Pyramid Pooling

PSPNet uses pooling of different scales to obtain contextual information of different scales

DeepLab V2&V3 use different scales of spatial convolution to achieve similar effects

Hole convolution with greater expansion rate - larger receptive field - more contextual features

DeepLab V3+

  • DeepLab V2/V3 models use ASPP to capture context features

  • Encoder/Decoder results (such as UNet) are integrated into low-level feature maps during upsampling to obtain finer segmentation maps

  • DeepLab V3+ combines the two ideas and adds a simple decoder structure to the original model structure

Encoder generates multi-scale high-level semantic information through ASPP

Decoder mainly fuses low-level features to produce surprising segmentation results

Guess you like

Origin blog.csdn.net/yichao_ding/article/details/131180349