(1) GAN Improvement Series | The latest ICCV2021 Generative Adversarial Network GAN Papers Summary

1、Dual Contrastive Loss and Attention for GANs

  • Generative Adversarial Networks (GANs) work very well for unconditional image generation when working with large-scale image datasets. But the resulting images are still easy to discern, especially on datasets with high variance (e.g. bedrooms, churches).

  • This paper proposes a new dual contrastive loss and shows that with this loss, the discriminator can learn more general and distinguishable representations to motivate generative quality. Furthermore, attention is revisited and extensive experiments are performed on different attention blocks in the generator. It was found that attention remains an important module for successful image generation, even though it is not used in recent state-of-the-art models. Finally, different attention architectures in the discriminator are investigated and a reference attention mechanism is proposed. Combining these measures improves FID by at least 17.5% on several benchmark datasets, and achieves more significant improvements (up to 47.5% in FID) on synthetic scenes.

64ed9263059a8a1cdd60a840974c7221.png

2、Dual Projection Generative Adversarial Networks for Conditional Image Generation

  • Conditional generative adversarial networks (cGANs) extend unconditional GANs to learn joint data label distributions from samples and are powerful generative models capable of generating high-fidelity images. The challenge of training is to reasonably inject class information into its generator and discriminator.

  • We propose a dual projection GAN (Dual Projection, P2GAN) model that learns a balance between data matching and label matching; we propose an improved cGAN model with auxiliary classification to directly align false by minimizing f-divergence and the true condition P(class|image). Experiments on multiple datasets including CIFAR100, ImageNet and VGGFace2 demonstrate the effectiveness.

3、Focal Frequency Loss for Image Reconstruction and Synthesis

  • The continuous development of generative models has led to significant advances in image reconstruction and synthesis. Nonetheless, there may still be a gap between the real image and the generated image, especially in the frequency domain.

  • This study shows that narrowing the frequency-domain gap can further improve image reconstruction and synthesis quality, and proposes a new focal frequency loss that can complement the existing spatial loss.

  • Experiments show that it can improve popular models (such as VAE, pix2pix, and SPADE) in terms of perceptual quality and quantification, and it has great potential on StyleGAN2 as well.

c82585f2a8bc02d5d7903e5b34e410d6.png

4、Gradient Normalization for Generative Adversarial Networks

  • This paper proposes a new normalization method: Gradient Normalization (GN) to address the gradient instability problem of Generative Adversarial Networks (GANs). Different from the existing gradient penalty and spectral normalization, the GN algorithm in this paper only imposes gradient norm constraints on the discriminator function; a large number of experiments on four datasets show that the method is in both Frechet Inception Distance and Inception Score. There can be advantages in all respects.66eadec96b1729acc4c5524c8a25ab1d.png

5、F-Drop&Match: GANs with a Dead Zone in the High-Frequency Domain

  • Generative adversarial networks lack the ability to accurately replicate high-frequency components of natural images. To alleviate this problem, this paper introduces two new training techniques, called Frequency Drop (F-Drop) and Frequency Matching (F-Match). Experiments demonstrate that the combination of F-Drop and F-Match improves the generative performance of GAN in both frequency and spatial domains.

6、Latent Transformations via NeuralODEs for GAN-based Image Editing

  • Much of the recent progress in high-fidelity semantic image editing relies on the decoupled latent space of generative models such as StyleGAN. Recent work has shown that controllable editing of face attributes can be achieved through linear movements and latent orientations. For many more complex variables, this work demonstrates the superiority of non-linear latent code operations for trainable neural ODE flows. After all, it is not simple to operate some attributes of a large number of datasets only by linear shift operations.

657c991ae7ffec0d5a646e5ddf2e36b2.png

7、Detail Me More: Improving GAN’s photo-realism of complex scenes

  • Generative models can synthesize realistic images of individual object objects. For the human face, for example, the algorithm learns to model the local shape and shadows of the face, i.e. changes in the eyebrows, eyes, nose, mouth, jawline, etc. This is possible because all human faces have two eyebrows, two eyes, a nose and a mouth, roughly in the same place. However, modeling complex scenes is more challenging because scene components and their locations vary from image to image. For example, a living room contains a different number of products belonging to many possible categories and locations, for example, a lamp may or may not be present in an infinite number of possible locations.

  • This paper proposes to add an "agent" module to the Generative Adversarial Network (GAN) to solve this problem. The agent task is to mediate the use of multiple discriminators in image regions. For example, if a light is detected or required in a particular area of ​​the scene, the agent assigns a fine-grained light discriminator to that image patch. This can prompt the generator to learn the shape and shadow model of the light. The resulting multi-fine-grained optimization problem is able to synthesize complex scenes with nearly the same level of realism as individual object images. The generality of the proposed method is demonstrated on several GAN ​​algorithms (BigGAN, ProGAN, StyleGAN, StyleGAN2), image resolutions (2562 to 10242) and datasets. The method is a significant improvement over current GAN algorithms.

692eb05170738aee84cfb88868b9da72.png

8 igen Custom GAN: Layer-wise Custom Learning for GANs

  • Different layers of Generative Adversarial Networks (GANs) have different image semantics. Few GAN models have explicit dimensions to control the semantic properties represented in a particular layer.

  • This paper proposes EigenGAN, capable of unsupervised mining of interpretable and controllable dimensions from different generator layers. Code: https://github.com/LynnHo/EigenGAN-Tensorflow

77fd799ad1aaf514d14cd1ca68767b50.png

9、Omni-GAN: On the Secrets of cGANs and Beyond

  • Conditional Generative Adversarial Networks (cGANs) are powerful tools for generating high-quality images, but most existing methods perform unsatisfactorily or risk mode collapse.

  • This article introduces OmniGAN, a variant of cGAN that addresses the problem of training a suitable discriminator. The key is to ensure that the discriminator is strongly supervised and moderately regularized to avoid collapse. A new record was set on the ImageNet dataset, with Inception scores of 262.85 and 343.22 for image sizes of 128 and 256, respectively, more than 100 points higher than the previous record.

b18352c98758a4f03a6b35c4a2ffe5a1.png

10、Towards Discovery and Attribution of Open-world GAN Generated Images

  • With recent advances in Generative Adversarial Networks (GANs), media and visual forensics require the identification of generated images. Existing work is limited to closed-set scenarios and cannot generalize to GANs that do not exist during training. An iterative algorithm is proposed, consisting of multiple components, including network training, out-of-distribution detection, clustering, merging, and refinement steps.de151efa49f92871706d6c751b8031f6.png

11、Unsupervised Image Generation with Infifinite Generative Adversarial Networks

  • Image generation is heavily studied in computer vision, and one of the core challenges is unsupervised image generation. Generative Adversarial Networks (GANs) have achieved great success in this direction as an implicit method and are widely adopted.

  • GANs suffer from mode collapse, unstructured latent spaces, and inability to compute likelihoods. This paper proposes a new unsupervised nonparametric method, called infinite conditional GANs or MIC-GANs, to solve several GAN ​​problems together, aiming to generate images with parsimonious prior knowledge. github.com/yinghdb/MICGANs.

-------------END-------------

Read in the past

【Attached download】Statistics for machine learning, 476 pages pdf

[Attached download] "Feature Engineering for Machine Learning".pdf

[Attached download] Tsinghua University: 2021 Metaverse Research Report!

【Collection】Complete analysis of the core concepts of machine learning

[with ppt and video] can explain the progress of machine learning, with ppt and video

Softmax function and its misunderstandings

In 2021, what are the potential and rising research directions of deep learning?

Understand the real-world meaning behind the matrix

An introductory guide to the "Markov Chain Monte Carlo Method" that can be read by novices

Is CNN a kind of local self-attention?

How does the attention mechanism learn which regions the model should pay attention to?

0222ded93c53d9754449f5765f2ba1ca.png

If you find it useful, click "Watching" 5569f0f5da9f27e921b95703666d324e.png

Guess you like

Origin blog.csdn.net/lgzlgz3102/article/details/123650370