[GAN] 2. Detailed explanation of the original GAN paper

written in front

In the previous article: [GAN] 1. Using keras to implement DCGAN to generate handwritten digital images In we used keras to implement a simple DCGAN and generated handwritten digital images. The results of the program let us appreciate the power of GAN, and then we begin to introduce various GAN models step by step. So let's start with the most basic GAN.


1. Introduction to GAN

The full name of GAN (Generative Adversarial Network) is called confrontation generation network or generation confrontation network. The concept of GAN was proposed by Ian Goodfellow in 2014 and quickly became a very hot research topic. At present, there are thousands of variants of GAN. Yann LeCun, the winner of the Nobel Prize "Turing Award" in the computer industry in 2019 and one of the pioneers of deep learning, also said: "GAN and its variants have been the foundation of machine learning for decades. The most interesting idea in the field.”

The paper link of the original GAN ​​is: Generative Adversarial Nets

First, let's summarize the original GAN ​​in one sentence. The original GAN ​​consists of two organic ensembles - the generator GGG and discriminatorDDD , the purpose of the generator is to map the random input Gaussian noise into an image ("false picture"), and the discriminator is to judge the probability of whether the input image comes from the generator, that is, the probability of judging whether the input image is a fake picture.

The training of GAN is also very different from CNN. CNN defines a specific loss function, and then uses gradient descent and its improved algorithm to optimize parameters, and uses the local optimal solution to approach the global optimal solution as much as possible. But the training of GAN is a dynamic process, the generator GGG and discriminatorDDD The mutual game process between the two. In layman's terms,the purpose of GAN is to create something out of nothing, to confuse the real with the fake. That is, to make the generator GGThe so-called "false image" generated by G fools the discriminatorDDD , then the optimal state is the generatorGGThe so-called "false graph" generated by G is in the discriminatorDDThe discriminant result of D is 0.5, and it is not known whether it is a real picture or a fake picture.

Next we will explain the relevant terms in GAN. The first to give an explanation is the generator GGG. _ GeneratorGGG is used to capture the data distribution of the input Gaussian noise, producing a "false graph". Next is the discriminator. DiscriminatorDDD is to evaluate the probability that the input sample is from the training set rather than the generator. Training GeneratorGGG is to maximize the discriminatorDDD Probability of making a mistake. The entire framework of the original GAN ​​is the generatorGGG and discriminatorDDD The dynamic process of mutual game between the two.


2. GAN training

Next we introduce the training of GAN. First, we give the objective function (loss function) of the original GAN, and the loss function is listed as follows:
min ⁡ G max ⁡ D V ( D , G ) = E x ∼ pdata ( x ) [ log ⁡ D ( x ) ] + E z ∼ pdata ( z ) [ log ⁡ ( 1 − D ( G ( z ) ) ) ] (1) \underset{G}{\mathop{\min }}\,\underset{D}{\mathop{\ max }}\,V(D,G)={ {\mathbb{E}}_{x\sim { {p}_{data}}(x)}}[\log D(x)]+{ { \mathbb{E}}_{z\sim { {p}_{data}}(z)}}[\log (1-D(G(z)))]\tag1GminDmaxV(D,G)=Expdata(x)[logD(x)]+Ezpdata(z)[log(1D(G(z)))](1)

Among them, GGG is for generator,DDD stands for discriminator,xxx represents real data,pdata p_{data}pdataRepresents the probability density distribution of real data, zzz represents random input data, which is random Gaussian noise.

As can be seen from the above formula, from the discriminator DDD point of view discriminatorDDD hopes to distinguish the real samples as much as possiblexxx and false samplesG ( z ) G(z)G ( z ) , soD ( x ) D(x)D ( x ) must be as large as possible,D ( G ( z ) ) D(G(z))D ( G ( z ) ) is as small as possible, that is,V ( D , G ) V(D,G)V(D,G ) The whole is as large as possible. from generatorGGFrom the perspective of G , the generatorGGG hopes to generate fake dataG ( z ) G(z)G ( z ) can fool the discriminatorDDD , that is, hopeD ( G ( z ) ) D(G(z))D ( G ( z ) ) is as large as possible, that is,V ( D , G ) V(D,G)V(D,G ) The whole is as small as possible. The two modules of GAN are training against each other, and finally reach the global optimum.

The original paper below also gives a schematic diagram of the training process of GAN. In the figure above, the parallel lines represent the noise zzz , which maps toxxx , the blue dotted line represents the discriminatorDDThe output of D , the black dot represents the real data distributionpdata p_{data}pdata, the green solid line represents the generator GGProbability distribution pg p_gof spurious data for Gpg. As can be seen from the figure below, during the training process of GAN, the generator GGThe probability density distribution of G is slowly approaching the probability density distribution of the real data set, and the predicted value of the discriminator is also decreasing. When the situation in the figure (d) below appears,D ( G ( z ) ) = 0.5 D(G (z))=0.5D(G(z))=0.5 , that is, it is impossible to distinguish whether the input image is a real image or a fake image forged by the generator .
insert image description here
Next, we give the algorithm for GAN training, as shown in the figure below. As can be seen from the training algorithm in the figure below,first use the above objective function combined with gradient ascent to train the K-time discriminator, and then combine the gradient descent to train the 1-time discriminator.
insert image description here
In the paper, the author also gave the mathematical proof of the relevant conclusions of GAN. There are already great gods on CSDN who have given the relevant detailed derivation process. I will not give the relevant proof here. If you are interested, please move to: GAN paperreading—— Original GAN ​​(basic concept and theoretical derivation). However, here is a summary of the related conclusions:

  1. Generator probability density distribution pg p_gpgwith real data distribution pdata p_{data}pdataWhen equal, the objective function of GAN achieves the global optimal solution.
  2. Optimal discriminator DDThe expression of D is:DG ∗ ( x ) = pdata ( x ) pdata ( x ) + p G ( x ) D_{G}^{*}(x)=\frac{ { { p }_{data}} (x)}{ { {p}_{data}}(x)+{ {p}_{G}}(x)}DG(x)=pdata(x)+pG(x)pdata(x), then GAN achieves the best pg = pdata p_g = p_{data}pg=pdata,,那么DG ∗ ( x ) = 0.5 D_{G}^{*}(x)=0.5DG(x)=0.5
  3. Combining the two points 1 and 2, although in the actual training of GAN, we cannot make pg = pdata p_g = p_{data} in the endpg=pdata, but we v must try to approach this result as much as possible, so that the generated "false picture" can be confused with the real one.

3. Experimental results

The original GAN ​​experimental results are given below. Firstly, the logarithmic maximum likelihood estimation on the MNIST data set and TFD is given, and the experimental results are shown below.
insert image description here
The next step is to visualize the experimental results, train on the mnist data set, TFD data set and CIFAR-10 data set respectively, and then generate intermediate training results, as shown in the figure below. Among them, Figure a is the training result of the mnist data set, Figure b is the training result of the TFD data set, Figure c is the training result of the GAN using the fully connected network on the CIFAR-10 data set, and Figure d is the use of convolution and deconvolution The training results of the product GAN on the CIFAR-10 dataset.
insert image description here


postscript

So far, the second part of the GAN series - the detailed explanation of the original GAN ​​paper is over. In the next blog, we will introduce DCGAN in detail.

Guess you like

Origin blog.csdn.net/qq_30091945/article/details/101079255