Popular understanding of generative confrontation network GAN

2.1 The basic structure of GAN

The main structure of GAN includes a generator G (Generator) and a discriminator D (Discriminator).

The player in the above example is equivalent to a generator, and we need him to perform well on the court. And the players are beginners at the beginning. At this time, they need a coach to guide them and tell them how they are training until they can really meet the standard of playing. And this coach is equivalent to the discriminator.

Let's take another example of handwriting to further explore the structure of GAN.

We now have a large data set of handwritten digits, and we hope to use GAN to generate some pictures that can be written in fake handwriting. It is mainly composed of the following two parts:

  1. Define a model as a generator (the generator in the blue part in Figure 3), which can input a vector and output a pixel image of the size of a handwritten number.
  2. Define a classifier as a discriminator (the red part Discriminator in Figure 3) to determine whether the picture is true or false (or is it from the data set or generated by the generator), the input is a handwritten picture, and the output is the discrimination Picture label

 

2.2 GAN training method

A generator (Generator) to generate handwritten digits has been defined above, a discriminator (Discrimnator) to determine whether handwritten digits are real, and some real handwritten digit data sets. So how do we train?

2.2.1 About the generator

For the generator, the input requires an n-dimensional vector, and the output is a picture with the size of the picture pixel. So first we need to get the input vector.

Tips: The generator here can be any model that can output pictures, such as the simplest fully connected neural network, or deconvolutional network.

The input vector here is regarded as some information that carries the output, such as the number of handwritten digits, the degree of scribbling of handwriting, and so on. Since we do not make any requirement for the specific information of the output number, we only require it to be similar to the real handwritten number to the greatest extent (it can deceive the discriminator). So we can use a randomly generated vector as input. The random input here is best to satisfy common distributions such as mean distribution, Gaussian distribution, etc.

Tips: If we need to obtain specific output numbers and other information later, we can analyze the output generated by the input vector and obtain the specific output by obtaining which dimensions are used to control the number and other information. And it is often not prescribed before training.

2.2.2 About the discriminator

Needless to say, the discriminator is often a common discriminator, the input is a picture, and the output is the authenticity label of the picture.

Tips: In the same way, the discriminator is the same as the generator. It can be any discriminator model, such as a fully connected network, or a network containing convolution.

2.2.3 How to train

The generator and discriminator are further explained above, and then how to train is explained.

The basic process is as follows:

Tips: The reason why we need to train the discriminator for k times, and then train the generator , is because we must first have a good discriminator, so that we can teach the difference between the real sample and the generated sample, and then we can more accurately treat the generator Update. For a more intuitive understanding, please refer to the following figure:


Our goal is to use the generated sample distribution (green solid line) to fit the real sample distribution (black dashed line) to achieve the purpose of generating fake samples.
It can be seen that when the state (a) is in the initial state, the distribution generated by the generator is quite different from the real distribution, and the probability of the discriminator discriminating the sample is not very stable, so the discriminator will be trained first to better Distinguish the sample.
Train the discriminator many times to reach (b) the sample state. At this time, the discriminant samples are distinguished very significantly and well. Then train the generator.
After training the generator, the sample state (c) is reached . At this time, the generator distribution is closer to the true sample distribution than before.
After many repeated training iterations, we finally hope to reach the state (d) , the generated sample distribution fits the real sample distribution, and the discriminator cannot distinguish whether the sample is generated or real (discrimination probability is 0.5). In other words, we can generate very real samples at this time, and the goal is achieved.

3. Training related theoretical foundation

I used the vernacular to explain the general process of training. The following will start with cross entropy, and explain the related theory of loss function step by step, especially the formula of min and max in the paper is as shown in Figure 5: The

discriminator is here It is a classifier used to distinguish the authenticity of the sample, so we often use cross entropy to determine the similarity of the distribution . The cross entropy formula is shown in Figure 6 below:


In the case of the current model, the discriminator It is a binary classification problem, so the basic cross entropy can be expanded more specifically as shown in Figure 7: After


extending the above formula to N samples, adding the N samples to obtain the corresponding formula as follows:

OK, so far It is still a basic two-category, and the special places in GAN are added below.

4. Summary

By using a discriminator instead of directly using the loss function to approximate, it is more able to grasp the overall information from the top to the bottom. For example, in the picture, although the difference is a few pixels, if the position of the pixel is in different places, the difference between them may be very large.

For example, the two sets of generated samples in Figure 10 above correspond to font 2, but although the two samples above are only different by one pixel, this pixel has a relatively large impact on the overall situation, but simply go Use the loss function to judge, then their errors are all one pixel apart, and the following two are six pixels apart (the pink part is the error), but in fact the overall judgment is That said, there is not much impact. However, if you use the loss function directly, you will get a gap of 6 pixels, which is greater than the difference between the two images above. And if you use a discriminator, you can better distinguish this situation (not limited to the difference of specific pixels).

In short, GAN is a very interesting thing, and now there are many related applications that use GAN, such as using GAN to generate portraits of people, using GAN for textual picture descriptions, and so on.

Quote: https://zhuanlan.zhihu.com/p/33752313
 

Guess you like

Origin blog.csdn.net/weixin_43135178/article/details/111987676