U-Net: Convolutional Networks for Biomedical Image Segmentation

    Two text detection networks were introduced earlier, namely RRCNN and CTPN. Next, I will introduce some classic networks for semantic segmentation, which are also the process of paper + code implementation. Here I will record what I have learned. Start with the paper Bar.

The original address of the English paper: https://arxiv.org/abs/1505.04597

   The previous paper forgot to introduce the name of the big guy, sorry here. . . Then, please propose U-Net bigwigs one by one: Olaf Ronneberger, Philipp Fischer, and Thomas Brox

Here are the homepages of the three bigwigs in turn    https://lmb.informatik.uni-freiburg.de/people/ronneber/   

https://lmb.informatik.uni-freiburg.de/people/fischer/           https://lmb.informatik.uni-freiburg.de/people/brox/   Among them, there are their papers and code implementation, interested can Learn to do it.

  Enter the text below. First of all, the author mentioned at the beginning that the labeled samples can be used more efficiently by using data augmentation. The structure includes a compression path for capturing contextual information, and a symmetrical expansion path for precise positioning. The characteristic of this kind of network is that it can perform end-to-end training on a few pictures and perform well. In medical images, each pixel needs to be classified. A big boss proposed a sliding window method to classify pixels through a patch (around the pixel) category. The requirements are that the network can be positioned, and the second is The number of patches is much larger than the training images, and the results are still gratifying. But then, the author began to criticize. First of all, the author thought that this method was very slow, and the network had to go through each patch, which would cause a lot of redundancy due to a lot of overlap. Furthermore, there is a trade-off between the accuracy of positioning and the use of context. A larger pathc requires more max pooling layers to reduce the positioning accuracy, while a small patch contains less context information.

However, the author thought of a clever way to solve the above problems. The structure proposed by the author is based on a fully convolutional network. The authors modified and extended it to work with few training images while producing more accurate segmentations.

The network structure is as follows:

 

      The main idea of ​​fully convolutional networks is to supplement the usual compressed networks with successive layers. Here the pooling operation is replaced by upsampling. These layers increase the resolution of the output, so, for localization, high-resolution features obtained from the compression path are combined with the upsampled output. A series of convolutional layers combine this information to learn a more accurate output.

      The authors modified the upsampling part of the structure with a large number of feature channels, allowing the network to propagate contextual information to higher resolution layers. Structurally, the compression path and the expansion path are more or less symmetrical, forming a U-shape. This U-net is rather strange, there is no fully connected layer, and only the effective part of each convolutional layer is used to achieve seamless segmentation of images of any size by overlapping + tiling. To predict pixels in image boundary regions, the missing context can be inferred by mirroring the input image. The strategy described earlier works well for large images. The author will perform elastic transformation of the training image (personal feeling is a routine of various image processing) to achieve data enhancement.

Let’s talk about the structure of the network in general. There are two paths in front of you, one is the compression path and the other is the expansion path. The structure of the compression path is the same as the convolution structure, including two 3*3 convolutions, each convolutions followed by a RELU, and a 2*2 max pooling layer (stride=2) for downsampling. In the process of each downsampling, the number of feature channels is doubled, each step in the expansion path includes upsampling, and then a 2*2 deconvolution is performed, and the number of feature channels is halved, corresponding to the one from the compression path. The cropped feature map is cascaded, and two 3*3 convolutions are performed at the same time, and a RELU is piggybacked. Cropping is necessary due to the loss of pixels on the convolution boundaries. In the last layer, a convolution of size 1*1 is used to map the 64-dimensional feature vector to the number of object classes. The secondary network has a total of 23 convolutional layers.

The Unet network is introduced here. It is mentioned here that the Unet network is very suitable for biomedical processing. At the same time, because there are few medical images, the author has carried out data enhancement, so that Unet can perform better.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325289944&siteId=291194637