「Computer Vision」Notes on pix2pixHD

Sina Weibo：小锋子Shawn
Tencent E-mail：[email protected]
http://blog.csdn.net/dgyuanshaofeng/article/details/82941903

这篇论文[1]是经典模型pix2pix[2]的高清版本，称为pix2pixHD。

作者：Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro
单位：NVIDIA Corporation, UC Berkeley

0 摘要

提出利用有条件生成对抗网络，从语义标注图合成高清照片风格的图像(2048 × 1024)。

1 介绍

This method has a wide range of applications. For example, we can use it to create synthetic training data for training visual recognition algorithms, since it is much easier to create semantic labels for desired scenarios than to generate training images. Using semantic segmentation methods, we can transform images into a semantic label domain, edit the objects in the label domain, and then transform them back to the image domain.这就可以在健康图像上添加病灶，然后用于训练。作者解决以下问题，the difficulty of generating high-resolution images with GANs，和 the lack of details and realistic textures in the previous high-resolution results[3]。

2 相关工作

略

3 实例级图像合成

从目标函数和网络结果两方面进行设计，提升photorealism和resolution。

3.1 pix2pix 作为基准模型

The framework operates in a supervised setting. The pix2pix method adopts U-Net as the generator and a patch-based fully convolutional network as the discriminator.

3.2 提升Photorealism 和 Resolution

从三方面进行考虑，其一为 a coarse-to-fine generator，其二为a multi-scale discriminator architecture，其三为a robust adversarial learning objective function。

3.3 使用实例图

3.4 学习实例级特征嵌入

4 结果

执行细节：使用最小二乘GANs即LSGANs[4]以稳定训练。

4.1 定量比较

从表1可知， Semantic segmentation scores on results by different methods on the Cityscapes dataset，pix2pixHD好于pix2pix和CRN很多，并且接近Oracle。

4.2 人类感知研究

4.2 交互目标编辑

5 讨论和结论

[1] High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs CVPR 2018 [paper] [project] [Pytorch code]
[2]
[3]