[Interpretation] CVPR2019 paper super-resolution methods (Adobe Research) multi-level nerve texture image-based migration

13894005-3538c6afbab78b5f.png

Super-resolution (Super-Resolution) aims to recover from the low-resolution image blurred image restoration a clear high-resolution images is an important task in computer vision, there is a very strong application prospects in the industry. CVPR super-resolution image is one focus of research, this year hired super-resolution optical and related research papers of no less than 10. This paper Explanation of the Institute for the article by Adobe as a super-resolution research papers published. By  1) the original arXiv link  2) project home  3) Code github repository  obtain papers related resources.

This article directory is as follows:

1. Overview of the full text

2. The method described

2.1 feature replacement

2.2 texture migration

2.3 Loss Function

3. Experimental Analysis

3.1 comprehensive comparison

impact on the performance of the similarity Picture 3.2

3.3 the number of levels of performance

3.4 Effect of texture loss function on performance

4. Summary

  1. Overview of the full text

Classic Single image super-resolution technology development (SISR) of low-resolution images because of loss of information inherent characteristics become challenging, the emerging super-resolution images based on the reference technology (RefSR) can be restored with the help of a high reference picture more details of the picture resolution, in order to study the recovery of the super-resolution technology opens up a new door. However, the existing RefSR methods require reference picture and low-resolution images have high similarity, when the super-resolution restoration reference picture when low-resolution images with a large difference in the effect will be a serious decline, even as no reference SISR method.

This paper aims to release the potential of technology RefSR by relaxing restrictions on the similarity of the reference picture, in a more robust way to more efficient use of texture detail reference pictures. Inspired by the recent image style of work, the author will RefSR expressed as nerve texture migration issues and designed an end-to-depth model SRNTT, the model can be used to enrich high-resolution image based on the similarity of textures texture adaptively migrate from the reference picture detail. Compared to the previous work, one of the important contribution SRNTT is its texture similarity calculation is no longer based on the original pixels, but based on multi-level image features. This pixel level to match the semantic hierarchy upgrading multiple matching model becomes so robust, so that in the reference image is completely uncorrelated with the input image, the super-resolution technique with a reference RefSR also play at least in the non-reference Usually the level of super-resolution technology SISR, not worse.

In order to promote the study RefSR, the authors established a new benchmark data sets CUFED5, every image data set is equipped with a different level of similarity of the reference picture. SRNTT and author of many advanced super-resolution methods were the same quantitative and qualitative assessments to evaluate experimental results show both PSNR / SSIM index or the human visual perception survey, SRNTT are reflected in the three data sets, including including CUFED5 beyond the advantages of other models, especially in the human visual perception survey, SRNTT in recognition of more than 90% of respondents rolling all contrast methods.

CLAIMS 1. A method of FIG SISR (SRGAN) compared with two kinds of methods RefSR (Crossnet and SRNTT) recovery of the super-resolution effect. Wherein the left-most image composed by the two reference pictures, marked (U) is a method RefSR indication upper image as a reference, labeled (L) Volume indication as a reference picture.

13894005-683ac8914931b4e2.png

A first group of pictures by feel visually distinguish between different model, FIG. 1 SRNTT with an advanced super-resolution method of FIG single SRGAN and an advanced super-resolution method with a reference CrossNet compared. When using a similar content as Big Ben CrossNet of the reference image effect can be, but with very different contents of the Golden Gate Bridge very poor, the bridge cable is substantially superimposed on the original texture images. In contrast SRNTT, even less relevant to the Golden Gate Bridge as a reference, it gets super-resolution image is also better than using picture effect CrossNet even better. Big Ben use as a reference SRNTT effect is very realistic, but also significantly improve image clarity is very rich in texture detail (note the details of the dial), the visual perception of this figure, it can be said SRNTT effect spike SRGAN and CrossNet.

  2. The method described

For both the limitations of existing methods: ①SISR generated texture is not realistic, as the reference picture artificial forgery ②RefSR requirements as similar as possible and aligned, the authors put forward end of the super-resolution texture migration network SRNTT, its overall structure shown in Figure 2:

13894005-c7b8401484116aa4.png
Figure 2. SRNTT overall network architecture

Characterized by a matching network migration and texture of two parts, wherein a similarity between the matching section responsible image block and the reference block to calculate the input picture and replaced, will eventually generate a new migration to the texture features of an input portion and a texture portion responsible for migration the novel features of FIG migrate integrated into the final super-resolved image generated. Because the matching portion and migration part has multiple levels, the authors of this structure will be referred to as multi-level nerve texture migration. Hereinafter, each component in detail.

  2.1 Replace feature

Wherein the replacement section may be divided into the following five steps:

① Zoom : In order to facilitate matching features do, firstly, the low-resolution image and the reference image scaling their pretreatment size of the target the same high-resolution image. Low resolution image, to enlarge the sampling we spend. The reference image, we first use the sample to reduce its size to the low-resolution image, and then the amplified sample, the first sample scaling may be the reference picture is blurred so that the low-resolution image includes a feature, so better able to the low resolution input image and feature matching.

② texture feature extraction : In order to weaken the influence of color and light in the picture similarity measure but the impact of strengthening the structure and texture information, the authors choose to compare the similarity in neural feature space, that is, the image feature extraction is carried out using a neural network match instead of the original pixels. For example, before the three authors use pre-trained in implementing VGG19 network as the feature extractor, because the texture characterization capabilities VGG19 network is very strong.

③ Block Segmentation : The low-resolution image and the reference image are cut into a plurality of densely same size image block of size 3x3 using the default block implemented in the code, the acquisition step size is 1, between the blocks They are overlapping.

④ similarity calculation : less than the product as the similarity metric, calculating the similarity of all of the input block and the reference block. Within volume of similarity calculation benefits indicator is converting the convolution calculation process into a set (or cross-correlation) operation, respectively, each reference block such as a low-resolution image convolution entire collation to complete all of the convoluted similarity calculation. Once calculated, each low-resolution pixel block will have the greatest similarity a corresponding reference image block.

⑤ block replacement : a low-resolution image for each block, the feature block are replaced with the maximum degree of similarity of the original reference image corresponding to the low-resolution images, the overlap between the replacement image blocks averaging each block as a final value. After the completion of the low-resolution image of each alternative level will receive a corresponding switching characteristic diagram M.

Can be found, wherein alternatively the entire process does not involve any training and without training parameters, it can be treated as data preprocessing. Although the replacement process complicated steps, but in fact the reference picture except for using the process in a high correlation, a high quality texture feature replaces the low-resolution rough texture feature space, with external information to fill vacancies in low-resolution image information.

  2.2 texture migration

13894005-02b1689c5412e32b.png
Network structure portion 3. SRNTT texture migration

After pretreatment wherein Alternatively, we obtained the switching characteristics of each of a texture feature FIG M. Texture migration network using the original low-resolution image and exchange characteristics map M gradually restored super-resolved image from a high level to a low level, the level of migration of each texture is the same, but the output image size will be larger single texture migration level network structure shown in Figure 3. First, the input image and the feature level exchange FIG splicing by channel, then using the residual block convolution block learning exchange characteristics associated with the input image texture (residual), and then incorporated into the learn relevant input texture image, Finally, the use of sub-pixel convolution (sub-pixel conv) after enlarged twice the combined image and outputs it to the next layer. When all levels have been migrated textures, a layer of super-resolution image acquired directly output the final merged image, the convolution is no longer subpixel 2-fold magnification.

  2.3 loss function

In order to 1) retain the spatial structure of the low-resolution picture 2) improve the visual quality of super-resolution picture 3) using the reference image of rich texture, SRNTT model using a combination reconstruction loss, loss of perception, texture against loss and loss of these four kinds of loss function . Wherein reconstructing the loss of the vast majority of SR method is adopted, and the perceived loss against losses by other studies shown to improve visual quality, and texture loss is the loss of a special method for RefSR definition is intended to enable super-resolution picture textures and exchange characteristics schematics as similar as possible. Definition and interpretation of each loss function is shown in Fig. Among them, the authors used when using against loss WGAN-GP . Because the purpose of the reconstruction is to reduce the loss of PSNR value, so the loss of the right to reconstruction in training weight 1, weight loss and texture perception of weight loss is 1e-4, right against the loss of a weight of 1e-6.

13894005-eb427e7fbd101521.png
4 kinds of loss function using FIG 4.SRNTT

  3. Experimental Analysis

  3.1 comprehensive comparison

作者将SRNTT与多个State-of-the-art的方法进行了对比,所有的低分辨图片均通过4倍bicubic缩放得到。定量分析实验以PSNR和SSIM指标,其结果如表1,模型的排列按照有/无参考进行组织。表中SRNTT-l2表示模型只使用MSE损失训练,而SRNTT-l2(SISR)表示模型以低分辨图片本身作为参考图片,这等价于无参考,所以也将其一并放入SISR组作为对照。整体来看,虽然有参考的SRNTT-l2变种模型在3个数据集上的的整体表现最好,但是与无参考的MDSR模型相比并没有很大的提升。

表1.各模型的PSNR/SSIM指标定量对比

13894005-fd7f6d7259e51750.png

 图5选取了CUFED5数据集中的3个样本展示不同模型的超分辨恢复效果,每两行图片展示一个样本。注意看红色标注框内的纹理,经过对比可以发现SRNTT模型的视觉质量明显优于其他所有方法(包括自己的l2变种),纹理的细致程度很高。可见PSNR指标并不能完全反映视觉质量的好坏,从视觉质量来看,SRNTT较其他模型的提升十分明显。

13894005-336e2a55e6ab794d.png
图5.各模型在CFUED5数据集上的效果

 为了使定性分析更有说服力,作者在2400民调查者中做了1对1的偏好调查,即从SRNTT模型和另一对比模型中选择一个进行投票。图6展示了SRNTT相比于其它各模型的投票率,可以看到SRNTT以大于90%的投票率差距轻松的打败了所有对比模型,这说明模型的视觉质量被普遍认可。

13894005-62b8c75e9bc8e721.png
图6.2400人的视觉观感调查结果

  3.2 图片相似度对性能的影响

为了验证模型对不同相似度级别的参考图片的适用性,作者对比了CrossNet和SRNTT在6种相似度级别下的表现,实验结果表2。其中PM表示是否使用块匹配(Patch-based Matching),GAN表示模型训练时有没有使用GAN和其他感知损失。表2中参考图片的相似度级别从左往右依次降低,其中HR(warp)表示对HR进行随机变换(缩小、放大、翻转),LR表示以低分辨图片本身作为参考。而SRNTT-flow是将SRNTT的特征交换部分替换成CrossNet中使用的optical-flow的变种,目的是比较两个模型之间的差距。实验结果显示SRNTT-l2对参考图片的适应性较高。

表2.不同相似度级别下PSNR/SSIM表现

13894005-8eb43dca3522bff0.png

  3.3 级别数量对性能的影响

SRNTT的特征替换部分是多级别的,为了研究级别数量对性能的影响,作者对比了模型在1~3级情况下的PSNR表现,且每一级都使用了不同相似度级别参考图片,实验结果如表3所示。可以发现,在参考图片的相似度相同时,PSNR值随级别数的增长而增长;在级别数量相同时,除了relu3层外,其他列的PSNR值均随参考图片相似度的降低而降低,实验结果与理论预期相符。

表3.使用不同VGG层进行特征变换对性能的影响

13894005-b0240159e99d5fa7.png

  3.4 纹理损失函数对性能的影响

为了验证使用纹理损失的好处,作者也做了对比实验,实验结果同样验证了它的有效性。图7展示了不使用纹理损失的SRNTT模型生成的超分辨图片,与图5中同样的样本进行对照可发现视觉质量确实明显下降(城堡窗户的细节和国旗中星星的纹理细节都变得模糊了)。

13894005-242f2e8dfa3ae04e.png
图7.SRNTT不使用纹理损失时的效果

  4.总结

文章的贡献可以总结为以下三点:

① 探索了更具一般性的基于参考图片的超分辨恢复问题,破除了SISR的性能障碍并放宽了现有RefSR技术对参考图片的相似度约束和对齐约束。

② 提出端到端的深度超分辨恢复模型SRNTT,能从任意的参考图片中迁移纹理并增强输出图像的纹理细节表现。

③ establish baseline data set CUFED5 for RefSR research, and promote the follow-up study on the treatment of different levels of similarity reference picture.

I read as above @ May this sincere Original starting in AI Yanxishe CVPR team, I have tried to ensure that the correct interpretation point of view, precisely, but I scholarship dredging shallow after all, if there be deficiencies text welcome criticism. All methods of interpretation of the original authors of all.

Click on the link below, enter AI Yanxishe CVPR top exchange group will see more articles and recommended reading!

https://ai.yanxishe.com/page/meeting/44

Reproduced in: https: //www.jianshu.com/p/ec1ea1964a6d

Guess you like

Origin blog.csdn.net/weixin_33675507/article/details/91170272