foreword

I already have U-net, why do I need linkNet?
For unet, see this article [Semantic Segmentation] unet structure and code implementation: https://blog.csdn.net/weixin_40293999/article/details/129648032
It introduces resNet, which focuses on a RealTime, real-time system for automatic driving, etc. The field of the returned result. unet is suitable for less real-time places such as medical diagnosis. It also borrows the structure of autoencoders.
Paper: https://arxiv.org/pdf/1707.03718.pdf is an article in 2017, only 5 pages, worth reading. A new deep neural network architecture is introduced for efficient pixel-level semantic segmentation for visual scene understanding. Using only 11.5 million parameters and 21.2 GFLOPs, the network is both accurate and fast.

1. Network structure

1.1 Schematic diagram of network structure

insert image description here

It is copied on the paper, it is recommended to read the paper directly.

1.2 Create LinkNet model

LinkNet can build the entire model from 4 basic modules
1. Convolution module (convolution+BN+Activate)
2. Deconvolution (deconvolution+BN+Activate)
3. Encoder (4 convolution modules)
4. Decoder (convolution module + deconvolution module + convolution module)
5. Realize the overall network structure (1, 2, 3, 4 can be built with building blocks): convolution module + deconvolution module + encoder + decoder

2. Code

2.1 Construction of each module

2.1.1 Convolution Module

Convolution module, initialize the default kernel_size=3, stride = 1, padding =1, that is, the size of the feature map is output as it is.
Then use sequential to process them into a pipeline

# 卷积模块
class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels,k_size=3,stride=1,pad=1) -> None:
        super().__init__()
        self.conv_bn_relu = nn.Sequential(
            nn.Conv2d(in_channels, out_channels,kernel_size=k_size,stride,padding=pad),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
            )
    
    def farward(self,x):
        x = self.conv_bn_relu(x)
        return x

2.1.2 Deconvolution module

Deconvolution requires two paddings, padding is the position where deconvolution starts, and output_padding fills the edge of the image after deconvolution

class DeconvBlock(nn.Module):
    def __init__(self, in_channels, out_channels,k_size=3,stride=2,padding=1,output_padding = 1) -> None:
        """
        反卷积需要有两个padding
        """
        super().__init__()
        #padding 是反卷积开始的位置， output_padding 将反卷积之后的图像的边缘部分进行填充
        self.deconv = nn.ConvTranspose2d(in_channels,out_channels,kernel_size=k_size,stride=stride,padding=padding,output_padding=output_padding)
        self.bn = nn.BatchNorm2d(out_channels)
    
    def farward(self,x, is_act=True):
        x = self.deconv(x)
        if is_act:
            x = torch.relu(self.bn(x))
        return x

2.1.3 Encoder module

The multiplexed convolution module consists of
insert image description here
4 basic convolution blocks + a shortcut block. It needs to be explained here, because the entire 4 convolutions are scaled by 1, so the shortcut also needs to be processed accordingly, otherwise it will not add up.

class EncodeBlock(nn.Module):
   def __init__(self, in_channels, out_channels) -> None:
       super().__init__()
       # 第一层需要对图像进行缩放
       self.conv1 = ConvBlock(in_channels,out_channels,stride=2)
       # 第2层不需要对图像进行缩放
       self.conv2 = ConvBlock(out_channels,out_channels)
       # 第三层，第四层原样输出
       self.conv3 = ConvBlock(out_channels,out_channels)
       self.conv4 = ConvBlock(out_channels,out_channels)
           
       self.short_cut =  ConvBlock(in_channels,out_channels,stride=2)
   def farward(self,x):
       out1 = self.conv1(x)
       out1 = self.conv2(out1)
       short_cut = self.short_cut(x)
       # 第一部分的输出和shortcut相加
       out2 = self.conv3(out1+short_cut)
       out2 = self.conv4(out2)
       return out2 + out1

2.2 Coding Network Structure

Still need to look at this network structure diagram
insert image description here
to start building blocks


class Net(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        # 第一层
        self.input_conv = ConvBlock(3,64,stride=2,k_size=7,pad=3)
        # maxpool 原来的图像缩放2倍
        self.input_maxpool = nn.MaxPool2d(kernel_size=2)
        # 四个编码器模块,通道扩大一倍，size减小一倍
        self.encode1 = EncodeBlock(64,64)
        self.encode2 = EncodeBlock(64,128)
        self.encode3 = EncodeBlock(128,256)
        self.encode4 = EncodeBlock(256,512)
        # 四个解码模块，和encode是对应的，通道数减小，size扩大为原来的一倍
        self.decode4 = DeconvBlock(512,256)
        self.decode3 = DeconvBlock(256,128)
        self.decode2 = DeconvBlock(128,64)
        self.decode1 = DeconvBlock(64,64)
        # 输出部分,第一层走默认即可
        self.deconv_out1 = DeconvBlock(64,32)
        self.conv_out = ConvBlock(32,32)
        # stride 为2 可以不写， 一共就是2分类。kesize=2，因为论文给的是2x2的,2x2的适合 padding是不需要变化的，都是0 保证正好变为原来的2倍，因为stride正好是2
        self.deconv_out2 = DeconvBlock(32,2,k_size=2,padding=0,output_padding=0)
        
    def farward(self,x):
        # input 的两层
        x = self.input_conv(x)
        x = self.input_maxpool(x)
        # 后面的中间值要保留
        e1 = self.encode1(x)
        e2 = self.encode2(e1)
        e3 = self.encode3(e2)
        e4 = self.encode3(e3)
        # 到此为止，左边半拉，完成
        
        d4 = self.decode4(e4)
        d3 = self.decode3(d4+e3)
        d2 = self.decode2(d3+e2)
        d1 = self.decode2(d2+e1)
        f1 = self.deconv_out1(d1)
        f2 = self.conv_out(f1)
        f3 = self.deconv_out2(f2)
        return f3

Initialize and look at the structure

 Output exceeds the size limit. Open the full output data in a text editor
Net(
(input_conv): ConvBlock(
  (conv_bn_relu): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
)
(input_maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(encode1): EncodeBlock(
  (conv1): ConvBlock(
    (conv_bn_relu): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv2): ConvBlock(
    (conv_bn_relu): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv3): ConvBlock(
...
(deconv_out2): DeconvBlock(
  (deconv): ConvTranspose2d(32, 2, kernel_size=(2, 2), stride=(2, 2))
  (bn): BatchNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)

2.3 Loss function & training

model = Net()
loss_fn = nn.CrossEntropyLoss()

Its training is almost exactly the same as unet training, with the IOU indicator added.
The division of the IOU index
tensor and tensor uses torch.true_divide(tensor1,tesor2)

2.4 Training

training up

Output exceeds the size limit. Open the full output data in a text editor
epoch:  0 loss：  0.072 accuracy: 0.806 IOU: 0

      test_loss：  0.071 test_accuracy: 0.81 test_iou: 0
epoch:  1 loss：  0.072 accuracy: 0.806 IOU: 0

      test_loss：  0.07 test_accuracy: 0.81 test_iou: 0
epoch:  2 loss：  0.071 accuracy: 0.807 IOU: 0

      test_loss：  0.07 test_accuracy: 0.809 test_iou: 0
epoch:  3 loss：  0.071 accuracy: 0.807 IOU: 0

      test_loss：  0.07 test_accuracy: 0.811 test_iou: 0
epoch:  4 loss：  0.071 accuracy: 0.807 IOU: 0

      test_loss：  0.071 test_accuracy: 0.81 test_iou: 0
epoch:  5 loss：  0.071 accuracy: 0.807 IOU: 0

      test_loss：  0.07 test_accuracy: 0.81 test_iou: 0
epoch:  6 loss：  0.071 accuracy: 0.808 IOU: 0

      test_loss：  0.07 test_accuracy: 0.81 test_iou: 0
epoch:  7 loss：  0.071 accuracy: 0.808 IOU: 0

      test_loss：  0.071 test_accuracy: 0.81 test_iou: 0
epoch:  8 loss：  0.071 accuracy: 0.809 IOU: 0
...
      test_loss：  0.07 test_accuracy: 0.81 test_iou: 0
epoch:  9 loss：  0.071 accuracy: 0.809 IOU: 0

      test_loss：  0.071 test_accuracy: 0.809 test_iou: 0

insert image description here
Dataset overview

[Semantic segmentation] LinkNet from 0 to 1 and code implementation

Article directory