Pytorch quick start and actual combat - three, Unet implementation

Column directory: pytorch (image segmentation UNet) quick introduction and actual combat - zero, preface
pytorch quick introduction and actual combat - 1, knowledge preparation (introduction to elements)
pytorch quick introduction and actual combat - 2, deep learning classic network development
pytorch quick introduction And actual combat - three, Unet realizes
pytorch quick introduction and actual combat - four, network training and testing

Continuing the above pytorch quick start and actual combat - 2. Chapter 8.4 of deep learning classic network development

1 Preliminary preparation

1.1 torch installation

pytorch installation solves itself

1.2 Dataset preparation

My own simulated data, all the data is 1600 pairs (inputs, labels), the training set and the test set are extracted at 9:1.
My input size is 120*240, and the label size is 256*256.

1.3 Network Structural Skeleton

The backbone is Unet, which can be changed according to your needs. Instead of changing your own network, you add a convolution to adapt to your own input and output.
First down a basic Unet diagram. Modify based on this.
insert image description here

1.4 Data analysis and network improvement

[The specific size and channel don’t matter, they can be set directly. How to set it is in the implementation, and here we only talk about the process] You
图中input的size是572x572x1,而我的size是120x240x1,我选择在Unet之前加一个卷积层以让我的输入成为方形120x120x1,为了后续计算方便,通过padding(直接padding或者通过卷积都可以)变成128x128x1。接下来就是常规Unet操作,所以我的网络结构图为:
insert image description here
can see the whole change process:How to change is explained in the implementation(A broken picture all afternoon, cursing)

120x240x1--卷积-->120x120x1--卷积-->128x128x1--卷积-->128x128x32
--池化-->64x64x32--卷积-->64x64x64
--池化-->32x32x64--卷积-->32x32x128
--池化-->16x16x128--卷积-->16x16x256
--池化-->8x8x256--卷积-->8x8x512--上采样-->16x16x256
--通道拼接-->16x16x512--反卷积-->16x16x256--上采样-->32x32x128
--通道拼接-->32x32x256--反卷积-->32x32x128--上采样-->64x64x64
--通道拼接-->64x64x128--反卷积-->64x64x64--上采样-->128x128x32
--通道拼接-->128x128x64--反卷积-->128x128x32--上采样-->256x256x16
(注意我左边是128开始的,所以没法拼接了,网络结构并不是严格对称的)
--1x1卷积核代替全连接-->256x256x1

2. Network implementation

2.1 Related knowledge

  1. First of all, we need to know the calculation formula of convolution:

O = (I − K + 2P )/S+1
O (output) is the output image, I (input) is the original image, K (kernel) is the convolution kernel size, P is padding, S (stride) is the step size

  1. And the calculation formula of deconvolution:

output = (input-1)stride+output_padding -2*padding+kernel_size
O = (I-1)*S + OP - 2P + K
O (output) is the output image, I (input) is the original image, K (kernel ) is the convolution kernel size, P is padding, S (stride) is the step size, OP is output_padding

  1. channel channel

Let me talk about my understanding:

In the practical sense, it is a feature (I use classification as an example: such as the roots, colors, patterns, etc. of watermelons).
In pictures, color is a feature, but features are not just colors.
For example, in my grayscale image, the channel is 1. If the channels of other color images (RGB, BGR, CMY) are all 3,
then you may have to ask. The channel in the image is 64. Is it 64 colors?
Refer to the above sentence "Features are not just colors", I don't understand other features, guess what is the distribution.

2.2 Code implementation:

Emmmmm let's explain it from the shallower to the deeper:The overall code of the network is placed at the end of the article.
First import the torch package:

import torch
import torch.nn as nn

Then design my network AdUNet and write it as a class, which inherits nn.module.
Mainly rewrite two methods: initialization __init__ and parameter return forward

Prior to this, in order to improve code reusability, the repeated double-layer convolution was designed as a function to facilitate code reusability:
insert image description here

def double_conv(in_channels, out_channels):  # 双层卷积模型,神经网络最基本的框架
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, 3, padding=1),
        nn.BatchNorm2d(out_channels),  # 加入Bn层提高网络泛化能力(防止过拟合),加收敛速度
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, 3, padding=1),  # 3指kernel_size,即卷积核3*3
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True)
    )

OK, let's start.

2.2.1 Initialization method __init__():

  1. Input Adaptation Layer
    First, independently design the convolutional layer adnet that allows the input to adapt to the network and put it into the network AdUNet class, convolve the input 1x120x240 into a square 1x120x120 , and use the convolution kernel method Conv2d that comes with pytorch to achieve:
    insert image description here

Set the input channel in_channels and the output channel out_channels, select a 2x1 convolution kernel, padding is set to 0, and the step size is set to (2,1), that is, the step size in the row direction is 2, and the step size in the column direction is 1.In this way, the step size can be set to double the size of the row direction. Adjust the size to 120x120
and then bind a BN layer and a ReLu layer. For the function and reason, refer to the previous article.
Then use a 3x3 convolution kernel with padding=5 to adjust the size from 120x120 to 128x128

self.adnet = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2, 1), padding=0, stride=(2, 1)),
            nn.BatchNorm2d(1),  # 加入Bn层提高网络泛化能力(防止过拟合),加收敛速度
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=5, stride=1),
            nn.BatchNorm2d(1),  # 加入Bn层提高网络泛化能力(防止过拟合),加收敛速度
            nn.ReLU(inplace=True)
        )
  1. 4 convolutional layers during downsampling + one underlying convolutional layer
    insert image description hereinsert image description hereinsert image description hereinsert image description hereinsert image description here
        self.dconv_down0 = double_conv(1, 32)
        self.dconv_down1 = double_conv(32, 64)
        self.dconv_down2 = double_conv(64, 128)
        self.dconv_down3 = double_conv(128, 256)
        self.dconv_down4 = double_conv(256, 512)
  1. max pooling layer
    insert image description here
self.maxpool = nn.MaxPool2d(2)
  1. 4 convolutional layers when upsampling
    insert image description here
    insert image description here
    insert image description here
    insert image description here
        self.dconv_up3 = double_conv(256 + 256, 256)
        self.dconv_up2 = double_conv(128 + 128, 128)
        self.dconv_up1 = double_conv(64 + 64, 64)
        self.dconv_up0 = double_conv(64, 32)
  1. 5 upsampling
    insert image description hereinsert image description hereinsert image description hereinsert image description hereinsert image description here
        self.upsample4 = nn.ConvTranspose2d(512, 256, 3, stride=2, padding=1, output_padding=1)
        self.upsample3 = nn.ConvTranspose2d(256, 128, 3, stride=2, padding=1, output_padding=1)
        self.upsample2 = nn.ConvTranspose2d(128, 64, 3, stride=2, padding=1, output_padding=1)
        self.upsample1 = nn.ConvTranspose2d(64, 32, 3, stride=2, padding=1, output_padding=1)
        self.upsample0 = nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1)
  1. 1x1 convolutional layer instead of fully connected layer
    insert image description here
        self.conv_last = nn.Conv2d(16, 1, 1)

2.2.2 Parameter return method forward():

Splice them together according to the network structure in the picture above! It's OK!
Oh yes, don't forget concat.
Why not write the repeated module of downsampling and upsampling together? It is because I don't want to pass parameters, because when downsampling, the value should be reserved before the pool for concat when upsampling, so I wrote it separately. The concat operation is also simple, just look at the code to understand, there is no difficulty.

    def forward(self, x):
        # reshape
        x = self.adnet(x)  # 1x128x128

        # encode
        conv0 = self.dconv_down0(x)  # 32x128x128
        x = self.maxpool(conv0)  # 32x64x64

        conv1 = self.dconv_down1(x)  # 64x64x64
        x = self.maxpool(conv1)  # 64x32x32

        conv2 = self.dconv_down2(x)  # 128x32x32
        x = self.maxpool(conv2)  # 128x16x16

        conv3 = self.dconv_down3(x)  # 256x16x16
        x = self.maxpool(conv3)  # 256x8x8

        x = self.dconv_down4(x)  # 512x8x8

        # decode
        x = self.upsample4(x)  # 256x16x16
        # 因为使用了3*3卷积核和 padding=1 的组合,所以卷积过程图像尺寸不发生改变,所以省去了crop操作!
        x = torch.cat([x, conv3], dim=1)  # 512x16x16

        x = self.dconv_up3(x)  # 256x16x16
        x = self.upsample3(x)  # 128x32x32
        x = torch.cat([x, conv2], dim=1)  # 256x32x32

        x = self.dconv_up2(x)  # 128x32x32
        x = self.upsample2(x)  # 64x64x64
        x = torch.cat([x, conv1], dim=1)  # 128x64x64

        x = self.dconv_up1(x)  # 64x64x64
        x = self.upsample1(x)  # 32x128x128
        x = torch.cat([x, conv0], dim=1)  # 64x128x128

        x = self.dconv_up0(x)  # 32x128x128
        x = self.upsample0(x)   # 16x256x256

        out = self.conv_last(x)  # 1x256x256

        return out

2.2.3 Semantic Segmentation Implementation Process

It is a pity to say that although the structure of the network has been realized, there is still some way to go to our goal, but fortunately, this network is indeed usable, as long as the data is loaded and trained, the results can be obtained, and even randomly generated Some matrices are used as images for training.
Here is a brief description of the process, and there are many details, and the details will be discussed in the next part: pytorch quick introduction and actual combat - 4. Network training and testing
Training:

According to the batch size, the training samples and labels in the dataset are read into the convolutional neural network. According to actual needs, the training images and labels should be preprocessed first, such as cropping, data enhancement, etc. This is conducive to the training of deep networks, speeds up the convergence process, avoids over-fitting problems and enhances the generalization ability of the model.

verify:

After training for an epoch, read the verification samples and labels in the dataset into the convolutional neural network and load the training weights. Verify according to the written semantic segmentation index, get the index score in the current training process, and save the corresponding weight. The method of training once and verifying is often used to better supervise the performance of the model.

test:

After all the training is over, read the test samples and labels in the dataset into the convolutional neural network, and load the best saved weight values ​​into the model for testing. The test results are divided into two types, one is to measure the network performance based on common index scores, and the other is to save the prediction results of the network in the form of pictures to intuitively feel the accuracy of the segmentation.

2.2.4 Integrate! (Network complete code)

import torch
import torch.nn as nn


def double_conv(in_channels, out_channels):  # 双层卷积模型,神经网络最基本的框架
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, 3, padding=1),
        nn.BatchNorm2d(out_channels),  # 加入Bn层提高网络泛化能力(防止过拟合),加收敛速度
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, 3, padding=1),  # 3指kernel_size,即卷积核3*3
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True)
    )


# class UpSample(nn.Module):
#     def __init__(self, in_channels, out_channels, kernel_size, stride, padding, output_padding):
#         super(UpSample, self).__init__()
#         self.up = nn.ConvTranspose2d(in_channels, out_channels, kernel_size=kernel_size, stride=2, padding=1)
#         self.conv_relu = nn.Sequential(
#             nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding),
#             nn.BatchNorm2d(num_features=out_channels),
#             nn.ReLU(),
#             nn.Conv2d(out_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding),
#             nn.BatchNorm2d(num_features=out_channels),
#             nn.ReLU(),
#         )
#
#     def forward(self, x, y):
#         x = self.up(x)
#         x1 = torch.cat((x, y), dim=0)
#         x1 = self.conv_relu(x1)
#         return x1 + x


class AdUNet(nn.Module):

    def __init__(self):
        super().__init__()

        self.adnet = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2, 1), padding=0, stride=(2, 1)),
            nn.BatchNorm2d(1),  # 加入Bn层提高网络泛化能力(防止过拟合),加收敛速度
            nn.ReLU(inplace=True),
            nn.Conv2d(1, 1, kernel_size=3, padding=5, stride=1),
            nn.BatchNorm2d(1),  # 加入Bn层提高网络泛化能力(防止过拟合),加收敛速度
            nn.ReLU(inplace=True)
        )

        self.dconv_down0 = double_conv(1, 32)
        self.dconv_down1 = double_conv(32, 64)
        self.dconv_down2 = double_conv(64, 128)
        self.dconv_down3 = double_conv(128, 256)
        self.dconv_down4 = double_conv(256, 512)

        self.maxpool = nn.MaxPool2d(2)
        # self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
        self.upsample4 = nn.ConvTranspose2d(512, 256, 3, stride=2, padding=1, output_padding=1)
        self.upsample3 = nn.ConvTranspose2d(256, 128, 3, stride=2, padding=1, output_padding=1)
        self.upsample2 = nn.ConvTranspose2d(128, 64, 3, stride=2, padding=1, output_padding=1)
        self.upsample1 = nn.ConvTranspose2d(64, 32, 3, stride=2, padding=1, output_padding=1)
        self.upsample0 = nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1)

        self.dconv_up3 = double_conv(256 + 256, 256)
        self.dconv_up2 = double_conv(128 + 128, 128)
        self.dconv_up1 = double_conv(64 + 64, 64)
        self.dconv_up0 = double_conv(64, 32)

        self.conv_last = nn.Conv2d(16, 1, 1)

    def forward(self, x):
        # reshape
        x = self.adnet(x)  # 1x128x128

        # encode
        conv0 = self.dconv_down0(x)  # 32x128x128
        x = self.maxpool(conv0)  # 32x64x64

        conv1 = self.dconv_down1(x)  # 64x64x64
        x = self.maxpool(conv1)  # 64x32x32

        conv2 = self.dconv_down2(x)  # 128x32x32
        x = self.maxpool(conv2)  # 128x16x16

        conv3 = self.dconv_down3(x)  # 256x16x16
        x = self.maxpool(conv3)  # 256x8x8

        x = self.dconv_down4(x)  # 512x8x8

        # decode
        x = self.upsample4(x)  # 256x16x16
        # 因为使用了3*3卷积核和 padding=1 的组合,所以卷积过程图像尺寸不发生改变,所以省去了crop操作!
        x = torch.cat([x, conv3], dim=1)  # 512x16x16

        x = self.dconv_up3(x)  # 256x16x16
        x = self.upsample3(x)  # 128x32x32
        x = torch.cat([x, conv2], dim=1)  # 256x32x32

        x = self.dconv_up2(x)  # 128x32x32
        x = self.upsample2(x)  # 64x64x64
        x = torch.cat([x, conv1], dim=1)  # 128x64x64

        x = self.dconv_up1(x)  # 64x64x64
        x = self.upsample1(x)  # 32x128x128
        x = torch.cat([x, conv0], dim=1)  # 64x128x128

        x = self.dconv_up0(x)  # 32x128x128
        x = self.upsample0(x)   # 16x256x256

        out = self.conv_last(x)  # 1x256x256

        return out


Guess you like

Origin blog.csdn.net/weixin_43938876/article/details/123406484