UNet algorithm principle interpretation and paddle implementation

UNet algorithm principle interpretation and paddle implementation

The U-Net network is a very classic image segmentation network, which originated from medical image segmentation. It has the characteristics of few parameters, fast calculation, and strong applicability, and is highly adaptable to general scenarios. U-Net was first proposed in 2015 and won the first place in the ISBI 2015 Cell Tracking Challenge.

The structure of U-Net is a standard encoder-decoder structure, as shown in Figure 1. The left side can be thought of as an encoder and the right side as a decoder. The image is first down-sampled by the encoder to obtain a high-level semantic feature map, and then up-sampled by the decoder to restore the feature map to the resolution of the original image. A skip connection is also used in the network, that is, every time the decoder upsamples, the feature maps corresponding to the same resolution in the decoder and the encoder are fused in a splicing manner to help the decoder better restore the details of the target.

Figure 1 Schematic diagram of UNet model network structure

1) Encoder: The encoder presents a gradually shrinking structure as a whole, and continuously reduces the resolution of the feature map to capture contextual information. The encoder is divided into 4 stages. In each stage, the maximum pooling layer is used for downsampling, and then two convolutional layers are used to extract features. The final feature map is reduced by 16 times;

2) Decoder: The decoder presents an expanded structure that is symmetrical to the encoder, and gradually restores the details and spatial dimensions of the segmented objects to achieve precise positioning. The decoder is divided into four stages. In each stage, after the input feature map is up-sampled, it is concatenated with the feature map of the corresponding scale in the encoder, and then two convolutional layers are used to extract features. The final The feature map is enlarged by 16 times;

3) Classification module: use a convolution with a size of 3×3 to classify pixels;


illustrate:

Further reading: U-Net: Convolutional Networks for Biomedical Image Segmentation


The implementation of UNet is shown in Figure 2. For a pet image, first use the encoder in the convolutional neural network UNet network to extract features (including 4 downsampling stages) to obtain high-level semantic feature maps; then use the decoder ( Contains 4 upsampling stages) to restore feature maps to their original size. In the training phase, the model is trained by constructing a loss function from the prediction map output by the model and the real label map of the sample; in the inference stage, the prediction map of the model is used as the final output.


Figure 2 Design scheme of pet image segmentation

The overall U-Net network framework code implementation is as follows:

# coding=utf-8
# 导入环境
import os
import random
import cv2
import numpy as np
from PIL import Image
from paddle.io import Dataset
import matplotlib.pyplot as plt
# 在notebook中使用matplotlib.pyplot绘图时,需要添加该命令进行显示
%matplotlib inline
import paddle
import paddle.nn.functional as F
import paddle.nn as nn

class UNet(nn.Layer):
    # 继承paddle.nn.Layer定义网络结构
    def __init__(self, num_classes=3):
        # 初始化函数
        super().__init__()
        # 定义编码器
        self.encode = Encoder()
        # 定义解码器
        self.decode = Decoder()
        # 分类模块
        self.cls = nn.Conv2D(in_channels=64, out_channels=num_classes, kernel_size=3, stride=1, padding=1)

    def forward(self, x):
        # 前向计算
        logit_list = []
        # 编码运算
        x, short_cuts = self.encode(x)
        # 解码运算
        x = self.decode(x, short_cuts)
        # 分类运算
        logit = self.cls(x)
        logit_list.append(logit)
        return logit_list

define encoder

Above we divided the model into three parts: encoder, decoder and classification module. Among them, the classification module has been implemented, and then the encoder and decoder parts are defined separately:

First is the encoder part. The encoder here increases the number of channels, reduces the size of the picture, and obtains a high-level semantic feature map by continuously repeating a unit structure.

The code implementation is as follows:

class ConvBNReLU(nn.Layer):
    def __init__(self, in_channels, out_channels, kernel_size, padding='same'):
        # 初始化函数
        super().__init__()
        # 定义卷积层
        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding)
        # 定义批归一化层
        self._batch_norm = nn.SyncBatchNorm(out_channels)

    def forward(self, x):
        # 前向计算
        x = self._conv(x)
        x = self._batch_norm(x)
        x = F.relu(x)
        return x
class Encoder(nn.Layer):
    def __init__(self):
        # 初始化函数
        super().__init__()
        # # 封装两个ConvBNReLU模块
        self.double_conv = nn.Sequential(ConvBNReLU(3, 64, 3), ConvBNReLU(64, 64, 3))
        # 定义下采样通道数
        down_channels = [[64, 128], [128, 256], [256, 512], [512, 512]]
        # 封装下采样模块
        self.down_sample_list = nn.LayerList([self.down_sampling(channel[0], channel[1]) for channel in down_channels])
    
    # 定义下采样模块
    def down_sampling(self, in_channels, out_channels):
        modules = []
        # 添加最大池化层
        modules.append(nn.MaxPool2D(kernel_size=2, stride=2))
        # 添加两个ConvBNReLU模块
        modules.append(ConvBNReLU(in_channels, out_channels, 3))
        modules.append(ConvBNReLU(out_channels, out_channels, 3))
        return nn.Sequential(*modules)

    def forward(self, x):
        # 前向计算
        short_cuts = []
        # 卷积运算
        x = self.double_conv(x)
        # 下采样运算
        for down_sample in self.down_sample_list:
            short_cuts.append(x)
            x = down_sample(x)
        return x, short_cuts

Define the decoder

After the number of channels reaches the maximum and the high-level semantic feature map is obtained, the network structure will start the decoding operation. The decoding here is to perform upsampling, reduce the number of channels and gradually increase the corresponding picture size until it returns to the original image size. In this experiment, the bilinear interpolation method is used to realize the upsampling of the image.

The specific code is as follows:

# 定义上采样模块
class UpSampling(nn.Layer):
    def __init__(self, in_channels, out_channels):
        # 初始化函数
        super().__init__()
        in_channels *= 2
        # 封装两个ConvBNReLU模块
        self.double_conv = nn.Sequential(ConvBNReLU(in_channels, out_channels, 3), ConvBNReLU(out_channels, out_channels, 3))

    def forward(self, x, short_cut):
        # 前向计算
        # 定义双线性插值模块
        x = F.interpolate(x, paddle.shape(short_cut)[2:], mode='bilinear')
        # 特征图拼接
        x = paddle.concat([x, short_cut], axis=1)
        # 卷积计算
        x = self.double_conv(x)
        return x
# 定义解码器
class Decoder(nn.Layer):
    def __init__(self):
        # 初始化函数
        super().__init__()
        # 定义上采样通道数
        up_channels = [[512, 256], [256, 128], [128, 64], [64, 64]]
        # 封装上采样模块
        self.up_sample_list = nn.LayerList([UpSampling(channel[0], channel[1]) for channel in up_channels])

    def forward(self, x, short_cuts):
        # 前向计算
        for i in range(len(short_cuts)):
            # 上采样计算
            x = self.up_sample_list[i](x, short_cuts[-(i + 1)])
        return x

Guess you like

Origin blog.csdn.net/weixin_43273742/article/details/122929234