CycleGAN, DiscoGAN, DualGAN

Quoted from: https: //zhuanlan.zhihu.com/p/26332365

task

The task here is to translate the image to image. If you read a paper cvpr last year, called pix2pix words, this task is more familiar. Even if you do not know pix2pix, before there was a fire, you can put a line drawing of a cat becomes a web application is to use the pix2pix algorithm. However pix2pix model is paired data on the training, that is to say, the line to apply for the cat, we train when you need to provide data to one of: a line drawing, and the corresponding real cat pictures .

However, in many cases, we do not have such perfect training data pairs. For example, if you want to become a zebra horse, and a horse is no such corresponding corresponds to a zebra. However, pictures and zebra horse has a lot of pictures. So this paper is hoped that, through the training data unpaired to learn transformation.

(When we discussed at the meeting of the group, owner of the proposed algorithm on vision and not very solid, able to change the application of the world. But this does not matter, I do not consider to change the world.)

But there is a paper called dual learning last year, do machine translation. If you consider the translation between languages, then this task may translate into more meaningful picture than the picture. In machine translation, translation paired data is very limited, but very much anticipated single language. This paper is to discuss how to use corpus alone to help translate. In fact, the starting point dualgan of this paper comes from this, and the center point of this innovative three articles (cycle consistentcy) is a dual learning in mind. (Although cyclegan not seem to know the paper and does not refer to dual learning)

CycleGAN starting point is more abstract. They began to discuss the first paragraph introduction Monet; they mean, for a Monet painting, one can imagine the scenery Monet painting should look like the original, it is a fast hardware AI should be able to do this thing.

Model (three articles a hair the same):

An ordinary GAN only a generator and a judgment from. In this article, there are two generators and a discriminator. A generator to convert the image domain into image X Y domain (represented by G), while the other generator to do the opposite, denoted by F. The two discriminator [official]and [official]try to distinguish between true and false two field pictures. (Here refers to the fake from the real picture to the photo transform)

Cycle consistency (the name is derived from cyclegan, another two have different names, but essentially a thing) is to make this transform to succeed. Reason, if you convert from X to Y, then the conversion from Y to X, and the final result should be similar to the input. They used herein distance L1 and the input to the final output as an additional penalty term.

It is the mathematical form [official]and [official]need to be small enough.

The penalty term to prevent the problem of mode collapse. Without this cycle consistency term, the network will output more realistic picture, but no matter what the input, the output will be the same. And if the rate of the cycle consistency, the same output can cause immediate failure of the cycle consistency. So it provides a picture after a transformation requires not only real, and contains information of the original picture.

Following figure illustrates.

Network details:

CycleGAN:

这里的generator跟 Perceptual losses for real-time style transfer and super-resolution是一样的。他们使用了Instance Normalization。判别器使用的和pix2pix一样(PatchGAN on 70x70 patches). 为了稳定GAN的训练,他们使用了最小二乘gan(least square gan)和 Replay buffer。不像pix2pix,他们的模型没有任何的随机性。(没有随机输入z,没有dropout)这里的生成器更像是一个deteministic的style transfer模型,而不是一个条件GAN。他们使用了L1距离作为cycle consistency.

DualGAN:

他们的生成器和判别器都和pix2pix一样 (没有随机输入z,但是有dropout的随机)。 他们用了wgan来训练。cycle consistency同样选用了l1。

DiscoGAN:

他们用了conv,deconv和leaky relu组成了生成器,然后一个conv+leaky relu作为判别器。他们用l2作为cycle consistency。

实验

CycleGAN:

一个主要的实验是,在cityscapes数据集中,从图片和其语义分割的双向翻译。他们的用的评估方法和pix2pix中完全相同。CycleGAN和CoGAN, BiGAN, pix2pix(算作上界)进行了比较。其中,CoGAN这个模型在设计之初也是为了同样的任务:idea是通过两个共享weight的生成器来从相同的z生成两个域的图片,这个两个图片都需要通过各自域的判别器的检测。 BiGAN原本是为了能够找到G的逆函数E的(输入图片输出z)。这里如果你把z当作一个image,那bigan就可以用来做这个任务。

结果证明CycleGAN比所有baseline都要优秀。当然跟pix2pix还是有所差距,但是毕竟pix2pix是完全监督的方法。

他们又研究了cyclegan每个成分起到的作用:只有adversarial loss没有cycle consistency;只有cycle consistency没有adversairial loss;只有一个方向的cycle consistency。结果如下图(一句话,每个部分都很重要)。

后面他们还试了很有趣的一些应用,那些应用没有进行评估,因为没有ground truth来进行评估。这些应用包括边缘<->鞋,马<->斑马,橙子<->苹果,冬景<->夏景,艺术画<->照片。结果都很酷炫。

DualGAN:

他们做了PHOTO-SKETCH, DAY-NIGHT, LABEL-FACADES和AERIAL-MAPS的实验。在某些实验中,他们甚至得到了比pix2pix更高的realness 分数(采集自amt)。在我看来,这说明realness是一个很差的评价方式,因为本质上只要你输出足够真实的图片就可以,而不需要跟输入的图片有关。

DiscoGAN:

我很喜欢这篇文章中的toy experiment。他们人工生成了两个域的GMM数据点,然后根据这个人造数据学习了一个DiscoGAN。另外两个baseline,一个是最简单的GAN,另一个是GAN with forward consistency(也就是一边的consistency)。这两个baseline都会有mode collapse的问题。

他们也有一个car到car的toy experiment。每个domain都有不同角度的3d车的图片,每15度有一些。然后你可以在这两个domain上学习discogan,可以得到角度的对应。 跟baseline相比,discogan能获得更correlated的角度关系。


They also tried FACE to FACE, FACE CONVERSION (gender, hair color, etc. r etc.), CHAIR TO CAR, CAR TO FACE, EDGES TO PHOTOS, HANDBAG TO SHOES, SHOES TO HANDBAG. These experiments are qualitative experiment, and it was not how cool.

discuss

Interesting phenomenon: wrong mapping. Because this is totally unsupervised, so sometimes the model will find the wrong mapping. For example, in cyclegan years, photos to labels of, building often is marked as a tree. DiscoGAN can also be seen in FIG. 5 (the last one), the change is learned Flipping (not exactly correspond to the same angle).

Moreover, change the cat like a dog, cat become such a task is difficult dogs, as it relates to the geometric transformation. Actually, I wonder whether it makes sense this task: This task is difficult because there is a standard, even if people will turn into a cat a dog, this is a very difficult thing. Even if you can keep the same pose, but such as how to transform between species, it is difficult to define.

Another thing that is not discussed in this article, I think is quite interesting: the current cycle is a circle, if we consider two laps, or one and a half like, what the outcome of it.

Guess you like

Origin www.cnblogs.com/lyp1010/p/11759825.html