MIMO-UNet learning

MIMO-UNet learning

After reading a paper on deblurring, I used this MIMO-UNet to run the data set I found, and the renderings have not been finished yet. First record what you have learned, and point out if there is something wrong.

Summary

The coarse-to-fine strategy is widely used in single-image deblurring network structures. The traditional approach is to stack sub-networks that take multi-scale images as input. The image sharpness is gradually improved from the bottom sub-network to the top network. Inevitably high computational consumption is generated. To design a fast and accurate deblurring network, MIMO-UNet is proposed.

Three characteristics of MIMO-UNet:

  1. The single encoder of MIMO-UNet inputs multi-scale images, which reduces the training difficulty.
  2. A single decoder of MIMO-UNet outputs multi-scale deblurred images, and a single U-network, simulating a multi-cascaded U-Net.
  3. Asymmetric Feature Fusion (AFF) is introduced to effectively fuse multi-scale features.

introduce

I think half of this part is about the abstract and deblurring the background. It doesn’t feel that important. You can skip it directly. Just take your own notes here.

Background: The camera module has been developed rapidly in the past ten years. When the camera or the target moves, the phenomenon of blur and artifacts still exists. Early: The CNN method, using CNN as the estimator of the blur kernel, constructs a two-stage image deblurring framework. CNN estimation stage + kernel-based deconvolution stage. (My understanding is that one is used to estimate the blur kernel, and the other learns to use the blur kernel to deblur through learning, and uses non-blind deblurring to deblur). Recent: CNN-based deblurring methods that learn the complex relationship between blurred-sharp images in a direct end-to-end manner. Deepblur: It consists of multiple stacked sub-networks for multi-scale blurring. Each sub-network takes a reduced image and then gradually restores a clear image in a coarse-to-fine manner.

The coarse-to-fine network design principle has been proven to be an effective method for image deblurring.

However, this increases the amount of computation and memory usage, making it difficult to use on mobile devices, vehicles, and robots. A lightweight CNN network is proposed, which is shallower than conventional networks, but cannot achieve the accuracy of state-of-the-art methods.

提出了MIMO-UNet。解码器输出多个去模糊图像,multi-output single decoder(MOSD)。单编码器输入多尺度图像,multi-input single encoder(MISE)。非对称特征融合,对多尺度特征进行有效融合,asymmetric feature fusion(AFF)。

结构

Please add a picture description
Please add a picture description
MIMO-UNet的编码器和生成器是由3个EBs(Encoder Blocks)和3个DBs(Decoder Blocks)组成。

多输入的单编码器 MISE

有研究表明,多尺度图像可以更好地处理图像中不同程度的模糊。

在MIMO-UNet当中,一个EB用不同尺寸的模糊图像作为输入。然后用缩小后的特征和下采样的图像互补信息。这种方法可以有效处理各种图像的模糊。

使用shallow convolutional module(SCM)从下采样提取特征,考虑到效率问题,如图4(a),使用3 x 3和1 x 1卷积层堆叠。将最后1x1层的结果和前边输入B结合,在用1 x 1细化连接特征。对于 S C M k o u t SCM^{out}_{k} SCMkout的特征和 E B k − 1 o u t EB^{out}_{k-1} EBk1out的融合,使用一个stride为2的卷积给到EB上,使得大小和SCM的输出一致进行融合。

A feature attention module ( FAM ) feature attention module is also proposed here to actively emphasize or suppress scale features and learn important features of spatial/channel features from SCM. ( EB k − 1 out ) ↓ (EB^{out}_{k-1})^{\downarrow}(EBk1out) (that is, after the aforementioned convolution, the size is reduced) and the SCM is multiplied between elements, and then through a 3 x 3 convolution, the output contains complementary information for deblurring, and finally added to( EB k − 1 out ) ↓ (EB^{out}_{k-1})^{\downarrow}(EBk1out)In ↓ , the final refinement is performed with a residual block, and 8 modified residual blocks are used.

EB (Encoder Block) explained in detail

The EB in MIMOUnet not only receives the reduced features extracted in the previous EB, but also extracts features from the downsampled blurred image, and then combines the two. The features of compression and downsampling are utilized . EB in the picture has three layers (purple, green, blue)

Purple: A convolutional layer with a step size of 2, which is used to compress the features output by the upper layer EB, and also reduce the size and the size of the feature map generated by the downsampled SCM.

Green: It is the FAM module, which is used to actively emphasize or suppress scale features, and it is expected to obtain complementary information for deblurring. Multiply element by element (personal understanding is that the formula can be listed through the practice in the figure for easy understanding. ( EB k − 1 out ) ↓ (EB^{out}_{k-1})^{\downarrow}(EBk1out) Assuming X1, the feature of the SCM module below is X2, and the convolution in the middle can be used as the operation of f(x). Then the total formula is X1 + f(X1 * X2), and the convolution in the middle is understood as the weight adjustment W, which is obtained by proposing X1. X1 * (1 + W * X2), that is to say, the convolution in the middle is to automatically adjust the function of enhancing or suppressing the weight of each feature) FAM module, the ablation experiment is used in this paper to confirm that this module helps to improve the PSNR index .

Blue: It is the residual module. There are 8 residual blocks in MIMO-UNet, which summarize and refine the previous features. MIMO-UNet++ is 20 residual blocks.

Multiple Output Single Decoder MOSD

In MIMO-UNet, different DBs have different feature map sizes. The approach in this paper is to apply intermediate supervision to each decoder. That is, the output of each Decoder Block is a clear image of the corresponding size for training. But what DB generates is a feature map instead of an image, and then an o() function is used to map the feature map to the image. The formula is as follows
Please add a picture description
This mapping is actually a convolution. Convert to a convolution of three channels.

Asymmetric Feature Fusion AFF

Each AFF takes the output of all EBs as input, and uses convolution to combine multi-scale features. The problem to be solved: In the traditional coarse-to-fine defuzzification method, the features in the coarse-scale network are used in the fine-scale network, resulting in inflexible information flow. The AFF module is proposed to allow information of different scales to flow in a U-Net. Each AFF takes the output of all EBs as input and combines multi-scale features with convolution. Pass it into the corresponding DB. Let each DB obtain features of different scales and improve the performance of deblurring.

loss function

The content loss uses L1Loss.
Please add a picture description
Since the purpose of deblurring is to restore the lost high-frequency components, it is very important to reduce the difference in frequency space, so the MSFR loss
Please add a picture description
F refers to the Fast Fourier Transform (FFT), and the final loss is
Please add a picture description
In the text, lambda is set to 0.1

Code Notes

The pytorch code provided in the paper is relatively old. If it is a newer pytorch version, the code needs to be changed. The changes are as follows:

# 插值修改尺寸部分
F.interpolate(...., recompute_scale_factor=True) 
# 所有代码中做插值修改尺寸大小中,要添加recompute_scale_factor=True参数,明确按照老版本的方式执行。不加会有警告。

# 傅里叶变换部分代码修改
# 旧版
label_fft1 = torch.rfft(label_img4, signal_ndim=2, normalized=False, onesided=False)
# 新版
t = rfft2(label_img4, dim = (-2))
label_fft1 = torch.stack((t.real, t.imag), -1)

renderings

The following is the result of running 30 epochs with the data set I found. The GoPro and RealBlur datasets are used in the paper. 3000 epochs were trained on GoPro, the Adam optimizer, and the initial learning rate is 1 0 − 4 10^{-4}104 , reduced to the original 0.5 every 500 epochs. Train 1000epoch in Realblur, the same learning rate, 500 epochs is also reduced to the original 0.5
fuzzy image
blur
Generated image
Please add a picture description
Ground Truth
Please add a picture description
, I will also release the MIMOUNet code that I have modified and can run through.
MIMOUNet

Related Links

Rethinking Coarse-to-Fine Approach in Single Image Deblurring

The source code MIMO-UNet in the paper

Guess you like

Origin blog.csdn.net/qq_36571422/article/details/123078073