paper:Mode Regularized Generative Adversarial

主要内容：本文的是从GAN的缺点出发，分析其起因，对梯度进行约束，使得训练更加稳定。paper

重要的句子：

GAN的缺点：（1）they are regarded as highly unstable and prone to miss modes。GAN训练不稳定而且对超参数十分敏感，虽然G和D理论上要交替更新，但是由于训练不稳定，有时一个要更新多次，再更新另一个。

分析：We argue that these bad behaviors of GANs are due to the very particular functional shape of the trained discriminators in high dimensional spaces, which can easily make training stuck or push probability mass in the wrong direction, towards that of higher concentration than that of the data generating distribution。

（2）On the other hand, a common failure pattern observed while training GANs is the collapsing of large volumes of probability mass onto a few modes。模型坍缩.

现象：Namely,although the generators produce meaningful samples, these samples are often from just a few modes
(small regions of high probability under the data distribution). Behind this phenomenon is the missing modes problem, which is widely conceived as a major problem for training GANs: many modes of the data generating distribution are not at all represented in the generated samples, yielding a much lower entropy distribution, with less variety than the data generating distribution.这里给出的现象，并不是跟上面blog提到的一样，这里说GAN会坍缩成很少的几个模型，是指很多图形容易被GAN收敛到一个model上，导致模型的表达能力不够生成的图片质量不高。更详细的说明。

分析：we argue that a general cause behind these problems is the lack of control on the discriminator during GAN training。We would like to encourage the manifold of the samples produced by the generator to move towards that of real data, using the discriminator as a metric. However, even if we train the discriminator to distinguish between these two manifolds, we have no control over the shape of the discriminator function in between these manifolds. In fact, the shape of the discriminator function in the data space can be very non-linear with bad plateaus and wrong maxima and this can therefore hurt the training of GANs。本文是说造成这种现象的原因是缺少对判别器的控制，跟模型坍缩blog里面的叙述有点不谋而合，上面给出的一种解释是一旦D对不真实的G进行错误的判断，就会导致在迭代过程中一点点放大，错误的判断，最终导致D和G都崩溃。还提到用manifold作为一种度量使用在D中效果也不好。

方法：To remedy this problem, we propose a novel regularizer for the GAN training target. The basic idea is simple yet powerful: in addition to the gradient information provided by the discriminator, we want the generator to take advantage of other similarity metrics with much more predictable behavior, such Differentiating these similarity metrics will provide us with more stable gradients to train our generator. Combining this idea with an approach meant to penalize the missing modes, we propose a family of additional regularizers for the GAN objective. We then design a set of metrics to evaluate the generated samples in terms of both the diversity of modes and the distribution fairness of the probability mass。

result：Regularizers usually bring a trade-off between model variance and bias. Our results have shown that, when correctly applied, our regularizers can dramatically reduce model variance, stabilize the training, and fix the missing mode problem all at once, with positive or at the least no negative effects on the generated samples。

related work：1.Conditional generative adversarial nets enlarges GAN’s representation capacity by introducing an extra vector to allow the generator to produce samples conditioned on other beneficial information。

2.Another line of work aimed at improving GANs are through feature learning, including features from the latent space and image space。（这里近几年来跟ae结合的挺多的）

3.Autoencoding beyond pixels using a learned similarity metric combine a variational autoencoder objective with a GAN and utilize the learned features from the discriminator in the GANs for better image similarity metrics. It is shown that the learned distance from the discriminator is of great help for the sample visual fidelity

4.In addition to feature distances, Generating images with perceptual similarity metrics based on deep networks found that the counterpart loss in image space further improves GAN’s training stability

MODE REGULARIZERS FOR GANS：1.Training the discriminator D can be viewed as training an evaluation metric on the sample space.（GAN就可以视为学习一种度量方式，其本质也是KL散度）。Then the generator G has to take advantage of the local gradient ∇ log D(G) provided by the discriminator to improve itself, namely to move towards the data manifold.（迭代的学习其实也是原始GAN的一个缺陷，他并没有统一学习过程，VAE似乎是做到了这一点）。

2.when the data manifold and the generation manifold are disjoint (which is true in almost all practical situations), it is equivalent to training a characteristic function to be very close to 1 on the data manifold, and 0 on the generation manifold. In order to pass good gradient information to the generator, it is important that the trained discriminator produces stable and smooth gradients. However, since the discriminator objective does not directly depend on the behavior of the discriminator in other parts of the space, training can easily fail if the shape of the discriminator function is not as expected. As an example,Denton et al. (2015) noted a common failure pattern for training GANs which is the vanishing gradient problem, in which the discriminator D perfectly classifies real and fake examples, such that around the fake examples, D is nearly zero. In such cases, the generator will receive no gradient to improve itself.如果 training data 中有一些 modes 的样本占比很小，则很容易在训练时被 miss 掉，使得 GAN 只能学会大 modes 的表达，从而生成出的样本会集中在某几个 modes 上，从而缺少 diversity。另一个更可怕的问题是，当这种 missing modes problem 出现时，更容易加剧 GAN 训练不稳定的问题。为此，这篇论文提出一种 mode-regularized 的思想，来增加对 GAN 训练过程的控制。具体来说，与其上 GAN 中的生成网络 G 直接从 noise vector z 映射到样本空间，我们可以让 z 从一个样本空间先映射过来，也就是有一个 z = encoder(X) 的过程，从而再，G(encoder(X))。这样的好处是，reconstruction 过程会增加额外的学习信息，使得生成网络生成出来的 fake data（generated sample）不再那样容易被判别网络 D 一下子识别出来。这样 D 和 G 就都能一直有 loss/gradient 去将训练过程较为稳定地进行下去，从而达到了让 GAN 训练更加稳定的效果。以上引用别人的观点，我的理解是通过encode来学习一个使应该属于M的x生成成更加接近M的变量，从而减少missing model。

3.For a typical GAN model, since all modes have similar D values, there is no reason why the generator cannot collapse to just a few major modes. In other words, since the discriminator’s output is nearly 0 and 1 on fake and real data respectively, the generator is not penalized for missing modes.

4.The difference is clear: the optimization target for the GAN generator is a learned discriminator. While in supervised models, the optimization targets are distance functions with nice geometric properties. The latter usually provides much easier training gradients than the former, especially at the early stages of training.

5.The encoder itself is trained by minimizing the same reconstruction error。

6.The idea of adding an encoder is equivalent to first training a point to point mapping G(E(x)) between the two manifolds and then trying to minimize the expected distance between the points on these two manifolds.因为 encoder(X) 保证了 X 和映射后的 X 的空间的对应性，也就可以保证了生成网络能覆盖所有样本空间的样本 modes，也就理论上保证了 missing modes 问题的减少。

7.The missing mode problem is caused by the conjunction of two facts: (1) the areas near missing modes are rarely visited by the generator, by definition, thus providing very few examples to improve the generator around those areas, and (2) both missing modes and nonmissing modes tend to correspond to a high value of D, because the generator is not perfect so that the discriminator can take strong decisions locally and obtain a high value of D even near non-missing modes.

8.在此基础上，作者继续提出了一种 manifold-diffusion GAN（MDGAN），它将这种 reconstruction 作为一种 regularizer 的想法进一步变成了 two-step training。也就是说，要让 reconstruction 也能有更好的目标，把这种 reconstruction loss 提取成单独的一步去训练——于是乎，第一步 manifold step 就是去做 G(Enc(X)) 和 X 的训练，减少这两者之间的差别；第二步 diffusion 就是让 G(Enc(X)) 再和 G(z) 做拉近。这样从模型训练的过程，形象的理解就是，先把两个分布的“形状”调整好，再把两个分布的距离拉近。

9.在实验中，这篇论文还有一个贡献就是提出了新的 evaluation metric，叫做 MODE score。作者基于以前由 OpenAI 团队提出的 Inception score，发现，Inception score 有一种很不好的现象是，即使 GAN 的网络训练塌了（即会产生完全是噪音，肉眼根本不 perceptual 的图片），inception score 的分数也会很高（很高应该代表的是图片非常像真实样本）。

paper:Mode Regularized Generative Adversarial

猜你喜欢