VNet medical imaging network Detailed papers

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/JMU_Ma/article/details/97935299

Why have VNet?

Many previous methods are only 2D image processing, and in the clinical practice, many of which are contained in the 3D volume. Thus proposes a volume based three-dimensional image segmentation method based FCN.

End to end training on MRI volume described, and learn a whole volume of the predicted segmentation.

Also introduction of a new V_NET objective function, the optimization during the training period according to Dice coefficient, in order to achieve a strong imbalance exists between the foreground voxels and can handle background voxel state.

Respond to a limited number of annotation volume (data), and non-linear transformation using a random histogram matching enhancement data.

Papers section

Introduction

Now many medical image segmentation based network modified U-Net network, derived from many so this type of network, they are similar.
Here Insert Picture Description

Why this kind of network performance in terms of medical imaging so good?

Medical imaging data features:

  1. Semantic relatively simple, unitary structure
  2. Data is relatively small, because of medical imaging data is relatively small, if the structure of the network model is too complex, and too many arguments (too much), it would lead to over-fitting our training model, resulting in the deviation.
  3. Multimodal. This requires that we have a better network characteristic data extraction.
  4. Interpretability important. Not only do we need is a 3D CT model diagram, we may also need to know the volume of the lesion, and in which a layer of concrete.

Network structure features:

  1. U-Net can better bond the underlying layer information and data information. Underlying information U-Net after several sampled low resolution information. The target can be provided in the entire image is divided semantic context information, it is understood response relationship between the target and its environment characteristics. This feature helps category determination object.
  2. ** senior Information: ** After concatenate operation transmitting high-resolution information from the encoder directly to the same height decoder. We can provide finer division wherein like gradient. skip-concatenate.

Method

Here Insert Picture Description

The left

We left the path, we convolution, in order to extract features from the input image, and using the appropriate pace at the end of each stage to reduce the resolution.

Left divided into different stages, operating at different resolutions, each stage having from 1 to 3 times a convolution operation. We have developed each stage, so that learning Residual Function, each input stage is: (a) (b) is added to the last stage of the output of the convolutional layer by layer and a non-linear convolution, to be able to learn Residual Function.

convolution

Convolution operation, we use the convolution kernel of size 5 * 5 * 5 performs convolution operation, and the convolution operation step is 2. As data compression along the different stages of the forward path, the resolution is lowered.

Because the second operation by considering only the volume of 2 × 2 × 2 blocks to extract a feature of non-overlapping, the size of the resulting halved wherein FIG. Operation thus generally:

Here Insert Picture Description

Effect of this operation is somewhat similar to our pool layer. (We found that in some benchmark image recognition, the maximum pool may simply be replaced by a convolution layer has an increased stride without loss of accuracy), it is replaced and used in the convolution process.

Alternatively convolution operation will lead to cell network in accordance with a specific implementation, during training can have a smaller footprint, since there is no map output cell switch back to its input layer is required to be used only by the application of backpropagation convolution rather than cancel the collection operation, you can better understand and analyze

Downsampling

Downsampling the present method reduces the size of an input signal, and a subsequent increase receptive field sites. But the next phase of our more than doubled the previous phase of the Feature.

Right

The right part of the network characterized and expanded lower resolution spatial support feature map, in order to collect the necessary information and to output a combination of dual channel volume segmentation.

Here Insert Picture Description

The last layer is a convolution kernel convolution size 1 × 1 × 1 and is characterized in generating two outputs of the same size in FIG volume divided by the probability of the input voxel is converted into foreground and background regions of the soft-max.

skip-connection

We will forward the early stages of the left part of CNN extracted feature to the right section. This is schematically represented in FIG. 2 is connected via the horizontal. In this way, we can collect in the compressed path in fine-grained detail is lost, and we can improve the quality of the final contour prediction. We also observed that when these connections to improve the convergence time model.

Dice Loss Layer

Dice Medical Imaging Layer of Loss [ https://blog.csdn.net/JMU_Ma/article/details/97533768 ]

Show results

Here Insert Picture Description
Compared with other algorithms
Here Insert Picture Description

Explain PPT Share

Links: https://pan.baidu.com/s/1fjSFs2p065L8NY1WR6ozkw extraction code: u5yp

Papers address

https://arxiv.org/abs/1606.04797

Guess you like

Origin blog.csdn.net/JMU_Ma/article/details/97935299