Detailed explanation of CycleGAN network

Note: No one has introduced the generator and discriminator structures of CycleGAN in detail. Here I will add some. There are
differences between the model loss and the model. I will supplement it based on the source code. Please pay more attention to it.

Introduction

[Paper download address] [ https://arxiv.org/abs/1703.10593 ]

Preface

CycleGan is a GAN network that implements image style conversion function. Pix2Pix existed long before it appeared to implement image style conversion, but pip2pip has great limitations, mainly for the two styles of images to appear correspondingly. In reality, it is difficult to find some images with different styles and the same style, and it can also be difficult to change. After shooting, CycleGan implements this function and converts between two types of images without the need for correspondence. It is very powerful and practical! ! !

This is also a picture generated by unpaired data, which means that you have some works of celebrities and some real pictures that you want to change the style. There is no intersection between the two pictures. The key to the Pix2Pix method mentioned in the previous article (Augmenting Human Imagination with AI) is to provide training samples with the same data in both domains. The innovation of CycleGAN is that it can achieve this kind of migration between the source domain and the target domain without establishing a one-to-one mapping between training data.
Insert image description here
To achieve this, there are two important points. The first It's a double discriminator. As shown in Figure a below, the two distributions X, Y, generators G, F are the mappings from X to Y and Y to X respectively, and the two discriminators Dx, Dy can discriminate the converted images. The second point is cycle-consistency loss. Use other images in the data set to test the generator. This is to prevent G and F from overfitting. For example, if you want to convert a puppy photo into Van Gogh style, if there is no cycle-consistency loss , the generator might generate a real Van Gogh painting to fool Dx, ignoring the input puppy.
Insert image description here

Generator and discriminator structure

The following figure is the structure diagram of the generator that I drew based on the corresponding code. It can be clearly seen that the structure of the generator is relatively simple. It is mainly divided into encoder structures. Among them, the decoder uses CIL (convolution, IN Regularization, Leaky ReLU activation function), and image enhancement (ReflectionPad2d, which is a method of symmetry of the image up, down, left, and right along the edge to increase the resolution of the image). Use the Residual block module in the two structural link areas. The default is 9 repeated modules to restore and enhance the data. The encoder part uses deconvolution, IN normalization, and ReLU activation functions to restore the size of the image. Finally, the image resolution is increased through ReflectionPad2d, and the image is restored to the original size through convolution, effectively solving the object edge information.
Insert image description here
The original discriminator in the figure below is a discriminator using the patchGAN structure. The focus of this discriminator is to compare the output matrix with an NxN structure. This generates the concept of global comparison and takes into account the difference in global receptive field information.
Insert image description here

Note: The picture below is a simplified version of PatchGAN, which is directly simplified into a discriminator. The discriminator structure is very simple. It is directly a chain structure, transformed, and finally flattened into a (b, 1) structure. This is the second discriminator linked to github.
Insert image description here

Internet download

Github link: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
This is the original author of the paper, which contains two models cyclegan and pix2pix, implemented based on Pytorch.
Configure the environment according to the author's requirements, and set up the running code to run. Is the effect good? In fact, this mainly depends on the complexity of the conversion between styles. If the conversion is too complex, the effect will be poor, and if it is simple, the effect will be better.

The above model seems to be relatively complicated. Here I provide a reproduced github link that will be easier to understand.
https://github.com/aitorzip/PyTorch-CycleGAN
The above model network structure is also drawn by me based on the code. Please ask any questions and I will correct them promptly.

model loss

The loss functions of the generator and discriminator are the same as GAN. The discriminator D tries its best to detect the fake pictures generated by the generator G, and the generator tries its best to generate pictures to fool the discriminator.

Against loss:

两部分组成:
Insert image description here
L G A N ( G , D Y , X , Y ) = E y   p d a t a ( y ) [ D Y ( y ) ] 2 + E x   p d a t a ( x ) [ D Y ( G ( x ) ) − 1 n ∗ n ] 2 L_{GAN}(G,D_Y,X,Y) = E_{y~p_{data(y)}}[D_Y(y)]^2 + E_{x~p_{data(x)}}[D_Y(G(x))-1_{n*n}]^2 LG A N(G,DY,X,Y)=Ey pd a t a ( y )[DY(y)]2+Ex pdata(x)[DY(G(x))1nn]2

D Y D_Y DYWhat is generated is a matrix, so what is compared here is a matrix, and MSE loss is used to calculate the loss.

as well as
Insert image description here

Cycle Consistency Loss

The authors say: Theoretically, adversarial training can learn to map the outputs G and F, which produce the same distribution as the target domains Y and X, respectively. However, with a large enough capacity, the network can map the same set of input images to any random permutation of images in the target domain. Therefore, adversarial loss alone is not guaranteed to map a single input. An additional loss is needed to ensure that G and F can not only satisfy their respective discriminators, but can also be applied to other images. In other words, G and F may work together to cheat, giving G a picture, G secretly turns the puppy into a self-portrait of Van Gogh, and F turns the self-portrait of Van Gogh into input. The arrival of Cycle Consistency loss stopped this opportunistic behavior. He tested FG with other paintings of Van Gogh and tested GF with other real photos to see if it could be changed back to its original appearance. This ensured that GF was in the entire X, Y Universality of the distribution interval.
Insert image description here
overall:

Therefore, the entire loss is the following formula, just like training two auto-encoders.
Insert image description here
Pay attention to two details:

  1. The loss function used in the adversarial loss is BCELoss.
    Insert image description here
    When the author trained the adversarial loss, he did not use BCELoss (two-class cross-entropy loss function), but used MSE (root mean square error). The author explained that the training effect of the root mean square error is better.

  2. The identity loss is not written in the paper, but the source code reflects the identity loss. This loss is mainly used to train the recognition ability of the network, expressed as A to B (real B) AtoB(real_B)A to B ( re a l _B) , to explain, it is to input the real B into the discriminator that generates B from A, and check the recognition loss of the discriminator. I hope the smaller the better! This shows that the generator network truly understands the structure of B. In the same way , B to A ( real A ) BtoA(real_A)also existsBt o A ( real a lA)
    L i d e n t i t y = E y   p d a t a ( y ) [ ∣ ∣ D Y ( G ( y ) ) − y ∣ ∣ 1 ] + E x   p d a t a ( x ) [ ∣ ∣ D X ( G ( x ) ) − x ∣ ∣ 1 ] L_{identity} = E_{y~p_{data(y)}}[||D_Y(G(y))-y||_1]+E_{x~p_{data(x)}}[||D_X(G(x))-x||_1] Lidentity=Ey pd a t a ( y )[∣∣DY(G(y))y1]+Ex pdata(x)[∣∣DX(G(x))x1]
    准确的说,
    L o s s = L G A N ( G , D Y , X , Y ) + L G A N ( G , D X , Y , X , ) + L c y c + L i d e n t i t y Loss = L_{GAN}(G,D_Y,X,Y) + L_{GAN}(G,D_X,Y,X,)+L_{cyc} + L_{identity} Loss=LG A N(G,DY,X,Y)+LG A N(G,DX,Y,X,)+Lcyc+Lidentity

After reading the source code, summarize the Loss implementation:
LGAN = 1 2 ∗ E x − pdata ( x ) [ DY ( G ( X ) ) − 1 ] 2 + 1 2 ∗ E y − pdata ( y ) [ DX ( F ( Y ) ) − 1 ] 2 L_{GAN} = \frac{1}{2}*E_{x-p_{data}(x)}[D_Y(G(X))-1]^2+\frac{ 1}{2}*E_{y-p_{data}(y)}[D_X(F(Y))-1]^2LG A N=21Expdata(x)[DY(G(X))1]2+21Ey pdata(y)[DX(F(Y))1]2
L c y c l e = E x − p d a t a ( x ) [ ∣ ∣ F ( G ( X ) ) − X ∣ ∣ 1 ] + E y − p d a t a ( y ) [ ∣ ∣ G ( F ( Y ) ) − Y ∣ ∣ 1 ] L_{cycle}=E_{x-p_{data}(x)}[||F(G(X))-X||_1]+E_{y-p_{data}(y)}[||G(F(Y))-Y||_1] Lcycle=Expdata(x)[∣∣F(G(X))X1]+Ey pdata(y)[∣∣G(F(Y))Y1]
L i d e n t i t y = E x − p d a t a ( x ) [ ∣ ∣ F ( X ) − X ∣ ∣ 1 ] + E y − p d a t a ( y ) [ ∣ ∣ G ( Y ) − Y ∣ ∣ 1 ] L_{identity}=E_{x-p_{data}(x)}[||F(X)-X||_1]+E_{y-p_{data}(y)}[||G(Y)-Y||_1] Lidentity=Expdata(x)[∣∣F(X)X1]+Ey pdata(y)[∣∣G(Y)Y1]
L = L G A N + L c y c l e + L i d e n t i t y L=L_{GAN}+L_{cycle}+L_{identity} L=LG A N+Lcycle+Lidentity

Summarize

CycleGAN has many and rich applications. I think it can be combined with everyone's fields, especially if the data set can be expanded, it will definitely achieve strong results. In addition, I think the CycleGAN network can also be adjusted. There are still many shortcomings. Improving the model is a good suggestion!

Guess you like

Origin blog.csdn.net/frighting_ing/article/details/123573395