Image segmentation 2020 latest progress


Number of positive characters: 3004 Reading time: 4 minutes

This article was published on the Neptune blog by Derrick Mwiti. In this article, we will discuss the application of deep learning in the field of image segmentation.


Posted by Jakub Czakon 

url : https://towardsdatascience.com/image-segmentation-in-2020-756b77fa88fc

source: neptune.ai

The topics we will discuss in this article are:

  • What is image segmentation

  • Image segmentation architecture

  • Loss function used in image segmentation

  • Frames available in your image segmentation project

Let us find out.

What is image segmentation

As the name suggests, image segmentation is the process of transforming an image into multiple parts. In this process, each pixel in the image is associated with a specific object. There are two main types of image segmentation: semantic segmentation and instance segmentation.

In semantic segmentation, all objects are of the same type, and all objects of the same type are labeled with a class label. In instance segmentation, similar objects can have their own independent labels.

Refer to the 2018 Paper by Anurag Arnab, Shuai Zheng and other authors: "Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation" http://www.robots.ox.ac.uk/~tvg/publications/2017/CRFMeetCNN4SemanticSegmentation.pdf

Image segmentation architecture

The basic structure of image segmentation consists of an encoder and a decoder.

Paper from 2017 by Vijay Badrinarayanan and other authors: "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation" https://arxiv.org/abs/1511.00561

The encoder extracts features from the image through filters. The decoder is responsible for generating the final output, which is usually a segmentation mask containing the contours of the object. Most architectures have this architecture or its variants.

Let's take a look at some examples.

U-Net

U-Net is a convolutional neural network originally used to develop biological influence segmentation. From a visual point of view, its architecture looks like the letter U, hence the name U-Net. Its structure consists of two parts, the left is the contraction path, and the right is the expansion path. The purpose of shrinking the path is to capture content, while the role of the expanding path is to help precise positioning.

Paper "U-net architecture image segmentation" from Olaf Ronneberger and other authors in 2015 https://arxiv.org/abs/1505.04597

U-Net consists of an expansion path on the right and a contraction path on the left. The contraction path consists of two 3×3 convolutional layers. After convolution, a corrected linear unit and a 2×2 max-pooling pooling layer are calculated for downsampling.

The complete implementation of U-Net can be found here: https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

FastFCN — fast fully connected network

In this structure, one uses the Joint Pyramid Upsampling (JPU) module to replace the extended convolutional network, because the convolutional network consumes a lot of memory and computing time. It uses a fully connected network as the core, while applying JPU for upsampling. JJPU samples the low-resolution feature map into a high-resolution feature map.

来自Huikai Wu等作者2019的Paper “FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation” https://arxiv.org/abs/1903.11816

If you want to implement through code, check here: https://github.com/wuhuikai/FastFCN


Gated-SCNN

This architecture consists of a dual-stream CNN architecture. In this model, a separate branch is used to process the shape information of the image. The shape stream is used to process boundary information.

Towaki Takikawa et al. 2019 "Gated-SCNN: Gated Shape CNNs for Semantic Segmentation" https://arxiv.org/abs/1907.05740

Code implementation: https://github.com/nv-tlabs/gscnn

DeepLab (Deep Lab)

In this architecture, convolution with an upsampling filter is used for tasks involving dense prediction. The segmentation of multiple objects is done by the space pool without space pyramid. Finally, the use of DCNNs improves the location of the target boundary. The filter is up-sampled by inserting zeros or input feature maps for sparse sampling, so as to achieve useless convolution.

Chen Liangjie and others, 2016 "DeepLab: Semantic Image Segmentation Using Deep Convolutional Network, Atrous Convolution and Fully Connected CRF" https://arxiv.org/abs/1606.00915

You can try its implementation on PyTorch (https://github.com/fregu856/deeplabv3) or TensorFlow (https://github.com/sthalles/deeplab_v3).

Mask R-CNN

In this architecture, objects are classified and located using a bounding box/bounding box and semantic segmentation, which classifies each pixel into a set of categories. Each area of ​​interest has a segmentation mask. And will generate a class label and a bounding box as the final output. In fact, this architecture is an extension of Faster R-CNN. Faster R-CNN consists of a deep convolutional network that proposes a region and a detector that uses the region.

Kaiming He et. al 2017 “Mask R-CNN” https://arxiv.org/abs/1703.06870

This is an image of the results obtained on the COCO test set.

Kaiming He et. al 2017 “Mask R-CNN” https://arxiv.org/abs/1703.06870


Image segmentation loss function

Semantic segmentation models usually use a simple cross-category entropy loss function in the training process. However, if you are interested in obtaining the details of the image, then you must revert to a slightly more advanced loss function.

Let's take a look at a few of them.

Loss of focus

This loss is an improvement over the standard mutual entropy standard. This is done and achieved by changing its shape to reduce the weight of the loss assigned to fully classified examples. Ultimately, this ensures that there is no class imbalance. In this loss function, as the confidence in the correct category increases, the cross-entropy loss scales with the scaling factor with zero attenuation. The scale factor automatically reduces the weight of the contribution of simple examples during training, and focuses on difficult examples.

Source: neptune.ai

Dice loss

This loss is obtained by calculating the smoothing dice coefficient function. This loss is the most commonly used loss and belongs to a segmentation problem.

Source: neptune.ai

IoU balanced loss intersection

The purpose of the IoU balanced classification loss is to increase the gradient of samples with high IoU and reduce the gradient of samples with low IoU. In this way, in this way, the positioning accuracy of the machine learning model can be improved.

Source: neptune.ai

Boundary loss

A variant of boundary loss is suitable for tasks that are highly unbalanced in segments. The form of this loss is the spatial contour\contour line rather than the distance measure of the area. In this way, the problem caused by the region loss in the highly unbalanced segmentation task can be solved.

Source: neptune.al

Weighted cross entropy

In a variable of cross entropy, all positive examples are weighted by a certain coefficient. It is used in scenarios or scenarios that involve class imbalance.

Source: neptune.ai

Lovász-Softmax loss

This loss is based on the convex Lovasz extension of the submodule loss, and directly optimizes the average intersection-greater than joint loss in the neural network.

Source: neptune.ai

Other losses worth mentioning are:

  • The purpose of TopK loss is to ensure that the network concentrates on hard samples during the training process.

  • The CE loss of the distance loss leads the network to a boundary area that is difficult to segment.

  • Sensitivity\sensitivity-specificity (SS) loss, used to calculate the weighted sum of the mean square error of specificity and sensitivity.

  • Hausdorff distance (HD) loss, the Hausdorff distance can be estimated from a convolutional neural network.

These are just a few loss functions used in image segmentation. To learn more, please click this link to view: https://github.com/JunMa11/SegLoss


Image segmentation dataset

If you see this, you will think about where to get the corresponding data set for image segmentation learning.

Now let's take a look at what data sets we use.

Common Objects in COntext — Coco data set

COCO is a large-scale data set generated by large-scale object detection, image segmentation and five descriptions. This data set contains a total of 91 item categories. Contains 250,000 people with key points. Its download size is 37.57GIB. It contains 80 object categories. It is available under the Apache2.0 license and can be downloaded from here (https://cocodataset.org/#download).

PASCAL Visualization Object Class (PASCAL VOC)

PASCAL has 9,963 pictures in 20 different categories. The training/validation set is a 2GB tar file. The data set can be downloaded from the official website: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

Cityscapes dataset

This dataset contains images of urban scenes. It can be used to evaluate the performance of visual algorithms in urban scenes. The dataset can be downloaded here: https://www.cityscapes-dataset.com/.

Cambridge Driving Annotation Video Database — CamVid

This is a motion-based segmentation and recognition data set. It contains 32 semantic categories. This link contains further explanation and a download link to the dataset: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/.

Image segmentation framework

Now that you have prepared a usable data set, let me introduce some tools/frameworks that can be used to get started.

  • FastAI library -given an image, this library can create masks/masks of objects in the image.

  • Sefexa Image Segmentation Tool- Sefexa is a free tool that can be used for semi-automatic image segmentation, image analysis and ground authenticity creation.

  • Deepmask- Facebook Research's Deepmask is the Torch implementation of Deepmask and SharpMask.

  • MultiPath -This is the Torch implementation of the object detection network in "MultiPath Network for Object Detection".

  • OpenCV -This is an open source computer vision library with more than 2500 optimized algorithms.

  • MIScnn -is an open source library for medical image segmentation. It allows the most advanced convolutional neural networks and deep learning models to build pipelines in a few lines of code.

  • Fritz- Fritz provides several computer vision tools, including image segmentation tools for mobile devices.

to sum up

I hope this article can provide you with some background knowledge of image segmentation, and provide you with some tools and frameworks for you to use in your work.

For more information, check the links attached to each architecture and framework.

LiveVideoStackCon 2020 Beijing

October 31-November 1, 2020

Click [read original text] for more detailed information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/109063801