GAN learning summary

1. What is GAN?

The two game parties in the GAN model are respectively acted by a generative model and a discriminative model.

  1. The generative model G captures the distribution of sample data, and uses noise z that obeys a certain distribution (uniform distribution, Gaussian distribution, etc.) to generate a sample similar to the real training data . The pursuit of the effect is to resemble the real sample, the better;
  2. The discriminant model D is a two-classifier that estimates the probability that a sample comes from training data (rather than generated data). If the sample comes from real training data, D outputs a large probability, otherwise, D outputs a small probability.

​ As shown in the figure below, the two-dimensional character generation process, the figure above is the Generator, given a vector, an image is generated through the Generator, and different vectors can be given to get different images. There is a question whether the generated image is close to the real image. At this time, Discriminator is required to determine. As shown in the figure below, the real image D network gives a higher score, and the unreal image gives a lower score. The ultimate goal is that the image D network generated by the G network can also give a very good score. High score .

Each dimension of the input vector represents some features
Insert picture description here
Insert picture description here

2. GAN network training?

Take the above two-dimensional image generation as an example, G (Generator) and D (Discriminator)

  • The function of G is to receive a vector Z (random noise) and generate an image through G, denoted as G(Z) ;
  • The function of D is to judge whether an image is "real". The input is X, X represents a real image, then D(X) represents the probability of the real image, 1 represents a 100% real image, and the output 0 means that it is not a real image.

During the training process, one party is fixed, the other party's network weight is updated, and the other party's network weights are alternately iterated. In this process, both parties try their best to optimize their networks to form a competitive confrontation until the two parties reach a dynamic balance. Ideally, the final result is that the image generated by G is very similar to the real image, and it is difficult for the D network to distinguish between the real image and the image generated by G. At this time, D(G(Z)) = 0.5.

Formula description:

  1. x represents the real image, z represents the input noise of the G network, and G(z) represents the image generated by the G network.
  2. D(x) represents the probability of real input, D(G(z)) represents the probability of the D network to determine whether the image generated by G is real;
  3. The purpose of G: D(G(z)) is the probability that the D network judges whether the picture generated by G is real, and G should hope that the picture it generates "the closer the picture is, the better." In other words, G wants D(G(z)) to be as large as possible, and V(D, G) will become smaller at this time. Therefore, we see that the front mark of the formula is min_G.
  4. Purpose of D: The stronger the ability of D, the larger D(x) should be, and the smaller D(G(x)) should be. At this time, V(D,G) will become larger. Therefore, the formula for D is to maximize (max_D).

The paper shows how to train the network D and G using the stochastic gradient descent method:

  1. Train D first, and use gradient ascent to make the loss function as large as possible.
  2. Retrain G and use the gradient descent algorithm to make the loss function as small as possible.

3.Auto-encoder VAE sum GAN

auto-encoder

​ The structure is as follows, train an encoder, convert input into code, then train a decoder, convert code into an image, and then calculate the MSE (mean square error) between image and input. After training this model, take it out In the second half of NN Decoder, input a random code to generate an image.
Insert picture description here
Insert picture description here

Take the latter part of the Auto-encoder and randomly input the vector, which is the Generator described above, but how to set the input vector? There is no standard.

​ In response to the above problems, VAE is proposed as follows:

Insert picture description here

The above two generative models actually have a very serious drawback. For example, VAE: Variational Auto-encoder (Variational Auto-encoder), the image it generates is hoped to be as similar to the input as possible, but how does the model measure this similarity? The model will calculate a loss, mostly using MSE, which is the mean square error on each pixel. Does small loss really mean similarity?

​ As shown in the figure below, the above line only differs by one pixel, and the obtained MSE is the smallest, but the result is not satisfactory to us.
Insert picture description here
The reason why the above two generation networks cannot achieve satisfactory results in generating images is because the network only pays attention to local information and does not consider global information. If global information needs to be considered, a deeper network structure is required;

The advantages and disadvantages of GAN network:

Generator: can generate partial images, but cannot take into account the relationship between different parts of the image;

Disciminator: consider the global image, but can not generate images
 

Original: https://blog.csdn.net/happyday_d/article/details/8540613 4

Guess you like

Origin blog.csdn.net/weixin_43135178/article/details/114818139