Image Segmentation

Image segmentation

Image segmentation is to predict the category or object to which each pixel in the image belongs. Image segmentation algorithms based on deep learning are mainly divided into two categories:

  1. Semantic Segmentation
    assigns a category to each pixel in an image.
    insert image description here
  2. Instance Segmentation
    is different from semantic segmentation. Instance segmentation only assigns categories to specific objects. This is somewhat similar to target detection, but target detection outputs bounding boxes and categories, while instance segmentation outputs masks and categories. .
    insert image description here

FCN

原文链接:Fully Convolutional Networks for Semantic Segmentation:https://openaccess.thecvf.com/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
insert image description here

FCN classifies images at the pixel level, thus solving the problem of image segmentation at the semantic level. FCN can accept input
images , and use the deconvolution layer to upsample the feature map of the last convolution layer to restore it to the same size as the input image, so that each pixel can be generated. prediction while preserving the spatial information in the original input image, and finally perform pixel-wise classification on the upsampled feature map.
insert image description here

U-Net

Original link: U-Net: Convolutional Networks for Biomedical Image Segmentation: https://arxiv.org/pdf/1505.04597.pdf
insert image description here

  1. Feature extraction part: It is a contraction network, which reduces the size of the picture through four downsampling. In the process of continuous downsampling, the feature extraction is shallow information.
  2. stitching
  3. The upsampling part, also called the expansion network, increases the size of the picture and extracts the deep information. Four upsampling is used. During the upsampling process, the number of channels of the picture is halved, which is different from the feature extraction channel of the left part. The number changes in the opposite direction.
    insert image description here

SegNet

原文链接:SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7803544
insert image description here

The SegNet architecture includes an encoder, a decoder, an encoding-decoding structure, and a deconvolution-upsampling-downsampling structure. The encoder uses the first 13-layer convolutional network of VGG16, and each encoder layer corresponds to a decoder layer. At the decoder, upsampling and convolution are performed. The output of the final decoder is fed into a softmax classifier to generate class probabilities for each pixel independently.

insert image description here

DeepLab

论文原文:SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS:https://arxiv.org/pdf/1412.7062.pdf
insert image description here

DeepLab is a method that combines deep convolutional neural networks (DCNNs) and probabilistic graphical models. For signal downsampling or pooling to reduce resolution, DeepLab uses the atrous (with hole) algorithm to expand the receptive field and obtain more context information.

DeepLab uses fully connected conditional random fields to improve the ability of the model to capture details.
insert image description here

Common Datasets for Image Segmentation

  • The PASCAL VOC
    VOC dataset is divided into 20 categories, including 21 categories for the background.
    insert image description here

  • MS COCO
    MS COCO is the largest image segmentation dataset. It provides 80 categories, more than 330,000 pictures, 200,000 of which are labeled, and the number of individuals in the entire dataset exceeds 1.5 million.
    insert image description here

  • Cityscapes
    Cityscapes is an image segmentation data set for effect and performance testing in the driving field. It contains 5,000 finely labeled images and 20,000 roughly labeled images. These images contain different scenes, different backgrounds, and different street views of 50 cities. And 30 types of object annotations covering ground, buildings, traffic signs, nature, sky, people and vehicles.
    insert image description here

Guess you like

Origin blog.csdn.net/weixin_43598687/article/details/130138459