Read the paper notes sixty-five: Enhanced Deep Residual Networks for Single Image Super-Resolution (CVPR2017)

Papers site: https://arxiv.org/abs/1707.02921

Code: https://github.com/LimBee/NTIRE2017

Summary

       DNN to perform super-resolution studies of the more popular, in which the residual learning greatly improved performance. Proposed a super-resolution depth enhanced network (EDST) which super-resolution performance than the current best model. Greatly enhance the performance of the model of this paper is to remove the convolution module to optimize network unimportant obtained. This model can be fixed at the same time training steps to further expand the size of the model to improve model performance. This paper also proposes a multi-size super resolution system (MDSR) and a training method, the model may be constructed according to different high-resolution image enlarged sparse.

Introduction

       The single image super resolution method (of SISR) mainly to the low resolution single image reconstructed high resolution image , in general, low resolution images , and the original high resolution image therebetween having strong conditions. Many studies have assumed for the results obtained by sampling two or three times. In real life, you can also consider other factors downgrade, for example, fuzzy, extraction or noise and so on.

       Recently, the value of the depth of the network to improve the super-resolution signal to noise ratio peaks (PSNR) of the bigger the better, reference https://www.jiqizhixin.com/articles/2017-11-06-8  , however, some of these structural models restrictions, first of all, to rebuild the network performance model is more sensitive to small changes in the structure, ie, the same model, you can get different levels of performance with different initialization and training methods. Therefore, when training network, you need well-designed structure and relatively fixed optimization methods.

       Second, most of the existing super-resolution algorithm different zoom sizes seen as separate issues, did not consider the use of super-resolution link between different dimensions. Therefore, these models need to determine a fixed size for different size scaling, then a separate training. VDSR super-resolution model can handle the problem in a single-size network. By VDSR-size model of training, training more than the fixed size, fixed size model shows that redundancy exists. However, the structure of the need for image VDSR two and three times the interpolated model as input, and therefore, requires a large amount of calculation and memory consumption.

        Although, SRResNet solve the computing and memory problems, the model is a simple application of the ResNet, and make limited changes. But the original ResNet is used to treat high-level issues, such as target detection, classification and, therefore, directly ResNet apply to such super-resolution low-level problem, the model may fall into local optimum.

         In order to solve these problems, to optimize the improvement on the basis of SRResNet, wherein the first analysis is not necessary to remove the module to simplify the structure of the network. The network load, the more difficult their training.

         Second, this paper model of training methods, knowledge transfer will be trained in another dimension model. In order to use the information in the training process independent of the size, the size of the lower model pre-trained in the training model for a larger size of the network. In addition, this paper presents a new multi-size model structure, sharing most of the parameters between the different sizes. This multi-size model compared to a plurality of single size much reduced computational model, but the performance is similar. Experiments were performed on DIV2K herein datasets which PSNR and SSIM are more excellent in performance.

Related work

      Early to solve the problem of super-resolution interpolation based on a simple theory. But prediction refinement, there is a limit Texture of the above method, to reconstruct higher resolution images by the best distribution analysis of natural images.

      More advanced work dedicated to studying and mapping functions between. Therefore, the learning method relies on spatial neighborhood embedded coding technique series. Some studies put forward by clustering patch of space and learning-related functions. Some methods of using the image itself similarity to exclude other databases, to increase capacity while the size of the internal dictionary by geometrically transforming patches.

      Recently, neural networks greatly enhance the depth of the super-resolution performance. Hopping continuous convolution and a super-resolution structure reduce transmission burden of a network Identity information, some encoding method - processing the image restoration problem skip decoding and symmetrical structure. Hopping can accelerate the convergence effect.

      In many super-resolution algorithm based on deep learning among the input image is first to go through two or three times on a sample interpolation, and then fed into the network. Some did not enter the picture will be sampled on, but the increase a sampling model on the back of the network, this is also desirable. Since the input feature size is reduced, and therefore, on the basis of the model without loss of the ability of the large number of calculations can be reduced. But there is a drawback of this type of algorithm, can not handle multi-dimensional problems in a single model. This article weighed more than two dimensions of training and computational efficiency. Not only take advantage of the inherent characteristics of the link between each size learned. At the same time a multi-scale model can be built high-resolution images for different sizes. In addition, this paper presents a multi-size training methods, combined with single and multi-scale model.

      Some studies related to loss of function in order to better train network. In the image restoration problem, both L2 and loss variance widely used, and mainly used to evaluate PSNR. But some people believe that the loss can not be guaranteed RSNR L2 and SSIM optimal, the experiment L1 loss can also achieve the same performance.

The proposed method

      In this paper, determine the size of EDSR and MDSR model for processing different sizes in a single high-resolution model.

       Residual Blocks:  stagger excellent in the network computer vision problem, improvement of the structure on the basis of SRResNet ResNet on, resulting in better performance. This article compares the original structural differences ResNet, SRResNet and methods described herein, shown below.

      This paper layer is removed in the network BN, BN layer because regularization features, eliminating a flexible range of the network. At the same time, due to the removal of the BN layer thus reducing the GPU utilization, during training, compared to SRResNet reduced by about 40% of the memory footprint. Thus, on the basis of the limited computing resources of a larger model can be constructed to obtain a better performance.

       Single-size model:   improving the performance of the model is the easiest way to increase the number of parameters. In a convolutional network, more layers may be superposed to increase the number of filter. Usually the depth of a convolutional network structure B (network layers), a width F (wherein the number of channels), the memory footprint is approximately O (BF), an amount of parameter O (BF ^ 2), and therefore, in consideration of the limited computing resources under increased F can increase the ability of the model. But after the F increases to a certain level, the training network will become unstable. To solve this problem, the residual scaling is adjusted to 0.1, the residual in each Block, after each layer is connected to a fixed convolutional constant layer. You can make the model more stable when more training. During the test phase, the previous convolution layer may be incorporated to improve the computational efficiency.

        Size of the model with a single article SRResNet similar. However, the outer block does not have a residual ReLU activation function. Finally EDSR model, the baseline model to B = 32, F = 256, scale factor 0.1. The network structure shown in Figure 3.

        When the model is trained on the sample x3, x4 fold for use herein as a pre-trained model x2 initialization. This initialization technique can accelerate and enhance the final properties of the train, as shown in FIG.

       Multi-Size Model:   seen from FIG. 4 to observe the super-resolution multi-task size is interrelated with each other. Multi-size model proposed in the use of its memory size correlation, a baseline design of the main branch B blocks i.e. residual web layers 16, therefore, different sizes, most of the parameters are shared. As shown below

        在多尺寸结构中,本文引入了尺寸确定的处理模块,用于处理不同尺寸的超分辨。首先,预处理模块位于网络的开头,用于减少不同输入尺寸图片的差异性。每个预处理模块包含两个残差block,其核大小为5x5。通过选用较大的卷积核,可以保证确定尺寸部分较浅,在网络的前半部分的感受野较大。在网络的最后,不同尺寸对应的上采样模块进行拼接。

         最终WDSR模型的层数为80,通道数64,,每个baseline模型包含1.5M个参数量,总共4.5个。而本文的多尺寸模型只有3.2M的参数量。其性能如下所示。

实验

      

 

Guess you like

Origin www.cnblogs.com/fourmi/p/10958339.html