Deep Learning Super-Resolution Reconstruction (Summary)

This article is an overview, see the previous article for details.


1.SRCNN:---2,3 improvement

The pioneering work, three convolutional layers, the input image is a low-resolution image after bicubic interpolation and high-resolution one size and then input to CNN.

Image patch extraction and feature representation, feature nonlinear mapping and final reconstruction . Use the mean squared error (MSE) as the loss function.

2. FSRCNN

Feature extraction : low-resolution image, the selected kernel 9×9 is set to 5×5. Shrinkage: 1×1 convolution kernel for dimensionality reduction. Nonlinear mapping: A 5×5 convolution kernel can be replaced by two concatenated 3×3 convolution kernels. Dilation : 1×1 convolution kernel for dimension expansion. Deconvolution layer : The inverse operation of the convolution layer, if the stride is n, the size is enlarged by n times, and the upsampling operation is realized.

Relative to SRCNN:

At the end, a deconvolution layer is used to enlarge the size, so the original low-resolution image can be directly input into the network; change the feature dimension, use a smaller convolution kernel and use more mapping layers; can share the If you need to train models with different upsampling ratios, you only need to fine-tune the final deconvolution layer.

3. ESPCN

The core concept is a sub-pixel convolutional layer, which takes the original low-resolution image as input, and three convolutional layers that   \[H \times W \times {r^2}\]re-arrange the feature images into  \[rH \times rW \times 1\] a high-resolution image.

The ESPCN activation function uses tanh instead of ReLU. The loss function is the mean squared error.

4. VDSR--7 improvement

Only learn the high-frequency partial residuals between high-resolution images and low-resolution images— residual networks

Input the image after interpolation of the low-resolution image , and then add this image to the residual learned by the network to obtain the final output of the network.

1. Deepen the network structure (20 layers), 2. Use residual learning ( adaptive gradient clipping limits the gradient to a certain range ). 3. Convolution fills 0 operation to ensure that the feature map and the final output image are consistent in size. 4. Multi-scale image co-training

5. DRCN: --7 improvements

Recurrent Neural Network Architecture

The input is the interpolated image, which is divided into three modules, the first is the Embedding network, which is equivalent to feature extraction, the second is the Inference network, which is equivalent to the nonlinear mapping of features, and the third is the Reconstruction network, which is from The feature image restores the final reconstruction result. The Inference network is a recursive network, that is, the data loops through the layer multiple times. Unrolling this loop is equivalent to multiple concatenated convolutional layers using the same set of parameters.

6. RED

Symmetrical convolutional layer - network structure composed of deconvolutional layers

The structure of the RED network is symmetric, and each convolutional layer has a corresponding deconvolutional layer. The convolutional layer is used to obtain the abstract content of the image, and the deconvolutional layer is used to enlarge the feature size and restore the image details.

Used the same as 4: There is a line in the network that connects the input image to the back and adds the output of the last deconvolution layer.

The features learned by the convolutional and deconvolutional layers in the middle of ED are the residuals between the target image and the low-quality image. The network depth of RED is 30 layers, and the mean square error used for the loss function.

7. DRRN: (4,5, Residual Network) **********

ResNet is local residual learning in chain mode. VDSR is global residual learning. DRCN is global residual learning + single-weight recursive learning + multi-objective optimization. DRRN is a multi-path mode of local residual learning + global residual learning + recursive learning of multiple weights.

A network structure with 1 recursive block and 25 residual units and a depth of 52 layers is selected.

8. LapSRN: ********** (improves most of the previous algorithms )

论文中作者先总结了之前的方法存在有三点问题。一是有的方法在输入图像进网络前,需要使用预先定义好的上采样操作(例如bicubic)来获得目标的空间尺寸,这样的操作增加了额外的计算开销,同时也会导致可见的重建伪影。而有的方法使用了亚像素卷积层或者反卷积层这样的操作来替换预先定义好的上采样操作,这些方法的网络结构又相对比较简单,性能较差,并不能学好低分辨率图像到高分辨率图像复杂的映射。二是在训练网络时使用 l2 型损失函数时,不可避免地会产生模糊的预测,恢复出的高分辨率图片往往会太过于平滑。三是在重建高分辨率图像时,如果只用一次上采样的操作,在获得大倍数(8倍以上)的上采样因子时就会比较困难。

LapSRN通过逐步上采样,一级一级预测残差的方式,在做高倍上采样时,也能得到中间低倍上采样结果的输出。由于尺寸是逐步放大,不是所有的操作都在大尺寸特征上进行,因此速度比较快。LapSRN设计了损失函数来训练网络,对每一级的结果都进行监督,因此取得了不错的结果。

9. SRDenseNet:

SRDenseNet将稠密块结构应用到了超分辨率问题上,这样的结构给整个网络带来了减轻梯度消失问题、加强特征传播、支持特征复用、减少参数数量的优点

10. SRGAN(SRResNet):**********

在这篇文章中,将生成对抗网络(Generative Adversarial Network, GAN)用在了解决超分辨率问题上

The SRResNet (the generative network part of SRGAN) is optimized with the mean square error. The experimental results in the article show that the SRResNet trained with the loss function based on the mean square error obtains a result with a high peak signal-to-noise ratio, but will lose some high The details of the frequency part are smooth, and the image is relatively smooth. The results obtained by SRGAN have better visual effects. Among them, the content loss is set to be based on the mean square error, based on the low-level features of the VGG model (loss function) and based on the high-level features of the VGG model. The high-level features can generate better texture details than the content loss based on the low-level features of the VGG model.

11. EDSR:**********

The most meaningful model performance improvement of EDSR is to remove the redundant modules of SRResNet, so that the size of the model can be enlarged to improve the quality of the results

This paper also proposes a network structure MDSR that can simultaneously vary the upsampling multiples.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325409137&siteId=291194637