Overview of segmentation networks-Deep Semantic Segmentation of Natural and Medical Images: A Review

Read the review of papers network segmentation
Deep Semantic Segmentation of Natural and Medical Images : A Review

Summary

Image semantic segmentation mainly completes pixel-level intensive prediction tasks, and examples of pixel composition that belong to the same category can be used for scene recognition and image content understanding. In the field of medical image analysis, image segmentation can be used to assist doctors in diagnosis. This article reviews the current mainstream segmentation networks and medical image segmentation networks from the following aspects: structural optimization, loss function optimization, sequence models, weakly supervised models, and multi-task models, pointing out the direction for further research and exploration.

# Section I Introduction
The application of deep learning to the field of medical image analysis, especially the segmentation of medical images (such as X-ray, MRI, PET, and CT images) has attracted much attention. The mainstream research directions include solving the problem of gradient disappearance of deep networks and applying compression Technology to build lightweight networks, optimize loss functions to improve model performance, etc.
The main work of this paper is as follows:
(1) A review of existing natural salient segmentation models and medical image segmentation models, covering 2D and 3D images.
(2) The segmentation framework is summarized into the following six categories: structure optimization, loss function optimization, data synthesis, weak supervision model, sequence model, and multi-task model. For details, see Fig1.
(3) Based on the above summary, it points out the direction for the next research and exploration.
Insert picture description here

Section II Natural Image Segmentation Network

This section summarizes the structure optimization of the segmentation network in the order of first natural image segmentation and then medical image segmentation, mainly optimizing network depth, width, connection mode, and introducing new network levels.
Part A Fully Convolutional Network (FCN)
The fully convolutional network proposed in 2015 can be said to be the originator of the semantic segmentation model. By replacing the fully connected layer of the traditional CNN with upsampling/transposed convolution, the network output is no longer It is the probability but the heatmap, which allows for dense prediction at the pixel level. The FCN model input is shown in the figure. In order to better preserve the spatial information in the FCN, the shallow output is merged into the up-sampling to obtain a finer pixel-level segmentation map.
Insert picture description hereInsert picture description herePart II Encoder-Decoder
Another type of semantic segmentation network is an encoding and decoding network structure represented by SegNet, UNet, etc., where the encoding network is used for feature extraction, usually each encoder is multi-layer convolution + BN +ReLu structure, the function of the decoding network is to restore the low-resolution features output by the encoder network to the input image resolution for each level to perform pixel-level classification or segmentation.
The following figure shows the network structure of SegNet and UNet respectively.
Insert picture description here
It can be seen that the upsampling is done by max pooling in SegNet, while it is done by transposed convolution in FCN.
UNetAlso based on the codec structure are the VNet structure and DenseNet structure proposed by Milletari later. The residual connection is added to the VNet and the 3D segmentation is performed; the densely connected Tiramisu network (TiramisuNet) is added based on the idea of ​​DenseNet; the spatial pyramid model and the hollow convolution are used to complete the fusion of different levels of context information, etc. (spatial pyramid The model completes the extraction of different levels of features through filters and pools of different scales. For example, the subsequent network is easier to capture sharp edges and restore spatial information; while the hole convolution is to complete the extraction of different levels of features through different expansion rates).
Insert picture description here

The following figure can compare the optimization of the connection mode in the codec network. For example, the encoding network information is transmitted to the decoding network through skip connection (green arrow) in UNet, VNet introduces residual connection in each block, and TitanisuNet is in each block. Dense connections are introduced into the front and back convolutional layers within a block.
Insert picture description here**Part C Network Simplification**
Idea 3 is to simplify the network through tensor sketching, channel/network pruning, sparse connection, etc., to reduce computational complexity.
**Part D attention network**
can complete the application of attention by selecting the most discriminative part from a series of output or feature maps produced by different layers. For example, select significant feature maps through global aberage pooling, add additional attention modules to ResNet, and Dual Atteention networks that focus on spatial and channel features at the same time.
**Part E Generative Adversarial Network**
The GANs idea proposed by GoodFellow can also be transferred to the segmentation network. By inputting GT and the output result of the segmentation network, the output of the training network is closer to the real segmentation map.

Section III Medical Image Segmentation Network

The medical image segmentation network can be divided into 2D and 3D according to the segmented image type.
Part A model compression The
segmentation network is used in the medical field, especially in the clinic, which inevitably has requirements for real-time performance, and the processed images are often very high-resolution, so network compression is very necessary. At present, model compression is performed through NAS, group normalization, hole convolution, weighting and other schemes.
Part B Medical image segmentation model optimization scheme based on codec structure The
codec structure has shown excellent performance in the field of image segmentation, but medical image segmentation still has many limitations compared with natural images. For example, it is difficult to collect medical images based on Training with large-scale data sets can easily lead to problems such as network overfitting and poor generalization.
According to Section II, there are the following optimization ideas:
Attention mechanism:
The main attempts are to use multi-level attention to improve the segmentation quality of abdominal MRI images, and the cavity convolution module to retain detailed information for 3D image segmentation.
Generate adversarial networks
for medical image segmentation GANs It has been used in the research of pancreatic CT image, retinal blood vessel segmentation, and brain tumor CT image. The idea of
cyclic neural network
Recurrent mainly uses LSTM and other processing sequence models, because many medical scan images are time series; another idea is to use recurrent recursion to increase the extraction of detailed information and the transmission of long-term dependencies, such as improving the segmentation of UNet Performance, there were some fancy UNet readings before.

Section IV loss function optimization

Since the loss function is the driving force of the network update, an optimization idea of ​​the segmentation model is to optimize the loss function.
The cross entropy loss function (Cross Entropy)
is the most commonly used in pixel-level classification is the cross entropy loss function, which is obtained by calculating the predicted value and the true value pixel by pixel. The formula is as follows:
Insert picture description hereInsert picture description hereOptimization 1: Weighted cross entropy loss function (WCE)
A very significant feature of medical image segmentation is that the proportions of samples of different categories are quite different. For example, the proportion of foreground vessels in the original image during retinal vessel segmentation is very small, and most of them are black background. Based on this highly uneven sample The performance of the trained classifier can be imagined. Therefore, it is natural to think of the weighted cross-entropy loss function to give different weights to different categories to reduce the impact of sample imbalance on model performance.
Insert picture description hereOptimization 2: Focal Loss
Focal Loss is also to solve the problem of imbalance between positive and negative samples. It has been further modified in cross entropy. Compared with the previous CE, it has found one more item:
Insert picture description herethus: the
Insert picture description herealpha scale factor is used to balance positive The proportion of negative samples.
For the understanding of Focal Loss, please refer to: Focal Loss

Evaluation indicators based on overlap
mainly include Dice coefficient (ie F1score), Tversky Loss, exponential log Loss, Lovasz-Softmax Loss, Boundry Loss, Conservertive Loss, etc.
The Dice coefficient is very familiar. Its calculation is similar to the IoU cross-combination ratio. Other Loss is not well understood and needs further supplementation and study.
Dice Loss/IoU/F1 Score

Insert picture description hereInsert picture description here

As mentioned above, medical image segmentation sometimes faces a situation where the foreground accounts for a small proportion and the background accounts for a large proportion. Therefore, the predecessors have made a series of optimizations for this problem;
optimization 3: cross entropy loss with regularization term
Insert picture description hereInsert picture description here

Section V medical image generation

It is well known that the performance of the CNN model is heavily dependent on the amount of training data, but in view of the difficulty in obtaining large-scale medical image data sets, it is inevitable to perform data amplification on the limited training data through geometric transformations; since the introduction of GANs, it is natural to think of image generation. Way to expand the training data. At present, the existing attempts include:
generating ventricular MR images and CT images through CGAN to expand data; in
EssNet, MR images are synthesized through CT images and finally used for CT image segmentation;
and X-Ray images used to synthesize multiple organs Wait.

Section VI weak supervision model

Obtaining pixel-level labeling information is time-consuming and laborious. If unsupervised/weakly supervised learning can be performed based on unlabeled images or partially labeled images, it is more in line with actual needs. The current attempts are:
by adding a distinction item to weakly supervised data in the loss function, it can reduce the computational complexity while maintaining the segmentation accuracy;
only use bounding-box-level supervision information for training;
use only image -level information output segmentation result mask;
with the help of ADMM, teacher-student model, etc., domain migration can achieve the effect of weakly supervised learning.

Section VII Multitasking Model

Multi-task learning that learns multiple tasks at the same time and each task maintains a certain accuracy is more in line with practical applications. The current research progress includes: By combining different losses together to complete the segmentation of building and aerial at the same time; VGG16+global average pooling+FCN to complete the task of patient detection + skin segmentation at the same time; The improved UNet model completes the task of chest CT segmentation and classification at the same time; Mask R-CNN is based on Faster R-CNN to complete the mask prediction with the help of image label and bbox. Multitasking in medical image segmentation mainly completes multi-category segmentation tasks, labeling different tissues and organs, etc.

Summary

Table 1 summarizes the typical segmentation network and its optimization. The main test data set is PASCAL VOC 2012, and the evaluation index is IoU.
Insert picture description hereThrough the above review, we can also see some limitations or difficulties in medical image segmentation:

(1) Most of the medical images are of high dimensionality, which is not suitable for directly plugging into the GPU. It usually requires operations such as slicing and patching, which makes it ineffective. Use spatial information;

(2) Due to the uniqueness of medical imaging systems, some noises that are different from natural images are often generated, which is troublesome to process, and some are even difficult to remove;

(3) It is difficult to obtain large data sets for medical images, so Semi-supervised and weak-supervised networks have more clinical application value;

(4) The auxiliary segmentation with prior knowledge is more suitable for medical image segmentation.

Potential Directions

The
mainstream segmentation model of UNet Architecture in the field of medical image segmentation is still the UNet series framework based on the codec structure combined with skip connection. Skip connection can effectively solve the problems of gradient disappearance and front-layer information transmission; combined with spatial pyramid, hollow convolution, etc., it can be further controlled The transfer characteristics of front-level information.
Sequence Models
medical image segmentation involves a large number of 3D images, so it is inevitable to use time series models to assist in processing. However, spatial geometric features will inevitably be lost in the slice process. In the future, it is necessary to further explore how to sequence volumemetric data.
Loss Functions The
traditional loss has overlap-based and distance-based loss functions. What needs to be solved is how to eliminate the problem of gradient disappearance in the deep network;
and how to automatically search for the best loss function like NAS.
Some other directions include:
Non-Medical pre-trained model multi-modal segmentation model. Since there are many types of medical images, such as MRI, PET, CT, X-Ray and each is not easy to obtain, it is difficult to find an effective one-for-all segmentation model, so consider whether you can use some non-medical images The pre-trained model assists the learning of the medical segmentation model.
Large open source data set. Striving to open source more large-scale 2D/3D medical image segmentation data sets, medical data sets are really precious.
Reinforcement Learning applies reinforcement learning to medical image segmentation like Conditional Random Field (CRF); it uses a small amount of labeled data and a large number of unlabeled weakly supervised models.
FP Analysis analyzes the reasons why some models fail to classify false positives.
the above.

Guess you like

Origin blog.csdn.net/qq_37151108/article/details/105991124