[Wan Zi] This article teaches you all about "generating against the network GAN"

1 Basic concept of GAN

1.1 Introduction to GAN

The full English name of GAN is Generative Adversarial Network, and the Chinese name is Generative Adversarial Network. It consists of two parts, the generator and the discriminator (also known as the discriminator), the generation network (Generator) is responsible for generating simulated data; the discriminant network (Discriminator) is responsible for judging whether the input data is real or generated. The generation network must constantly optimize the data it generates so that the discriminant network cannot make a judgment, and the discriminant network must also optimize itself to make its judgment more accurate. The relationship between them can be described as a competitive or hostile relationship.

In the original work of GAN, the author compares the generator to a criminal who prints counterfeit banknotes, and the discriminator to a policeman. Criminals are working hard to make banknotes look real, and police are improving their ability to spot counterfeit bills. The two compete with each other , and as time goes on, they will become stronger and stronger. Then analogous to the image generation task, the generator continuously generates fake images that are as realistic as possible. The discriminator judges whether the image is a real image or a generated image, and the two are continuously optimized through the game. The images produced by the final generator make it completely impossible for the discriminator to distinguish real from fake.

1.2 Basic architecture diagram of GAN

The generator is responsible for generating content based on random vectors, which can be pictures, text, or music, depending on what you want to create; the discriminator is responsible for judging whether the received content is real, usually he will give a Probability, which represents the authenticity of the content. Among them, the data in the real data distribution and the generated data can be considered to have the same shape.

Confrontation refers to the process of alternate training of GAN. For the original GAN, taking picture generation as an example, the noise obtained by random sampling from the Gaussian distribution is obtained through the generator to generate a fake picture, and the fake picture and the real picture are randomly selected and sent to the The discriminator distinguishes, let it learn to distinguish between the two, give the real high score, and give the false low score. When the discriminator can judge the existing data proficiently, let the generator get a high score from the discriminator. Continue to generate better fake pictures until the discriminator can be fooled, and repeat this process until the discriminator's prediction probability for any picture is close to 0.5, that is, the picture cannot be judged as true or false, and the training can be stopped.

The ultimate goal of an adversarial process is to model the distribution of the dataset as realistically as possible.

 Our ultimate goal of training a GAN is to obtain a good enough generator, which generates enough fake content

 1.3 Brief introduction of GAN principle

Both the generator and the discriminator are neural networks, and they both compete against each other during the training phase. These steps are repeated, and in the process the generator and discriminator get better and better at their respective jobs after each iteration.

Generator 

The Generator in GAN is a neural network that generates a real image through a series of nonlinear calculations given a set of random values. The generator produces fake images Xfake​, where a random vector Z, is sampled from a multivariate Gaussian distribution. The input to the generator is sampled from a multivariate normal or Gaussian distribution and produces an output equal to the size of the original image Xreal​. The following is a flowchart of a generator that takes a random vector as input and generates a fake digital image.

The role of the generator is to   deceive the discriminator, generate realistic images, and achieve high-performance generation after training.

Discriminator

The discriminator in GAN is based on the concept of discriminative modeling, which tries to classify different classes in a dataset with specific labels. Therefore, in essence, it is similar to a supervised classification problem. Furthermore, the discriminator's ability to classify observations is not limited to images, but also includes video, text, and many other domains (multimodal). The following is a flow chart for the discriminator to classify the images generated by the generator to determine whether they are true or false.

The role of the discriminator is to solve a binary classification problem, learning to distinguish between real and fake images: predicting whether observations were made by the generator (fake) or from the original data distribution (real), and in the process, it learns a set of parameters or weights. As training progresses, the weights are constantly updated.

2 GAN sample generation process

 The GAN model cannot realize specific functions as soon as it comes up, and needs to go through a training process. I call its state before and after training "original GAN ​​model" and "mature GAN model". The original GAN ​​model has to go through a training process to become a mature GAN model, and this "mature GAN model" is our GAN models for practical applications. This training process is specifically to train the generator network (Generator) and the discriminator network (Discriminator).

The generator G is a network that generates pictures. It receives a random noise z, and generates pictures through this noise. The generated pictures are denoted as G(z). Discriminator D decides whether a picture is "real". Its input is x, x represents a picture (where x contains generated pictures and real pictures, x=G(z) for generated pictures, output D(x) represents the probability that x is a real picture, if it is 1, It means that 100% is a real picture, and the output is 0, which means that 0% of the picture is real (or 100% is false)

2.1 Training process

    The game relationship between the generator and the discriminator: the discriminator punishes the generator, the discriminator gains, and the generator loses; the generator evolves, so that the discriminator punishes itself small, the generator gains, and the discriminator loses.

The specific process: the generator generates fake data, and then both the generated fake data and real data are input into the discriminator, and the discriminator needs to judge which ones are real and which ones are fake. The discriminator must have a large error for the first discrimination, and then we optimize the discriminator according to the error. Now that the level of the discriminator has improved, it is difficult for the data generated by the generator to fool the discriminator, so we have to optimize the generator in turn, and then the level of the generator has improved, and then continue to train the discriminator in turn, and the level of the discriminator has improved. , and then train the generator in turn, and so on, until the Nash equilibrium is reached.

 Only the discriminative model D participates in the first stage. Take the sample x in the training set as the input of D, and output a value between 0 and 1. The larger the value, the greater the possibility that the sample x is real data. In this process, we want D to make the output value as close to 1 as possible.
  In the second stage, both the discriminative model D and the generative model G are involved. First, the noise z is input into G, and G learns the probability distribution from the real data set and generates fake samples, and then inputs the fake samples into the discriminative model D. This time, D A value of 0 will be entered whenever possible. Therefore, in this process, the discriminative model D is equivalent to a supervised binary classifier, and the data is either classified as 1 or 0. 

2.2 Objective function of GAN

Generative models capture the distribution of the data and are trained in a way that tries to maximize the probability that the discriminator will be wrong. On the other hand, the discriminator is based on a model that estimates the probability that the samples it gets are received from the training data rather than from the generator. GANs are formulated as a minimax game, where the discriminator tries to maximize its reward V(D, G), while the generator tries to minimize the discriminator's reward, or in other words, maximize its loss.

GANs define a noise Pz(x) as a priori, which is used to learn the probability distribution Pg of the generation model G on the training data x, and G(z) means mapping the input noise z into data (such as generating pictures). D(x) represents the probability that x comes from the real data distribution Pdata instead of Pg. Accordingly, the optimized objective function is defined in the form of minmax (see http://t.csdn.cn/29Btw for details ):

The minmax in the above formula can be understood as that when updating D, the above formula needs to be maximized, and when G is updated, the above formula needs to be minimized. The detailed  explanation is as follows:

It should be noted that the generator does not minimize the objective function of the discriminator, but minimizes the maximum value of the objective function of the discriminator. The maximum value of the objective function of the discriminator represents the JS divergence between the real data distribution and the generated data distribution . The JS divergence can measure the similarity of the distribution, the closer the two distributions are, the smaller the JS divergence. That is, the goal of the discriminator is to minimize the cross-entropy loss, and the goal of the generator is to minimize the JS divergence between the generated data distribution and the real data distribution .

(1) When updating the parameters of the discriminant model D:

  • For the sample x from the real distribution Pdata, we hope that the output of D(x) is closer to 1, the better, that is, the larger the logD(x), the better;
  • For the data G(z) generated by noise z, we want D(G(z)) to be as close to 0 as possible (that is, D can distinguish true and false data), so log(1−D(G(z)) ) is also the bigger the better, so maxD is needed.

(2) When updating the parameters of the generated model G:

  We want G(z) to be the same as the real data as much as possible, that is, Pg=Pdata. Therefore, we hope that D(G(z)) is as close to 1 as possible, that is, the smaller the log(1-D(G(z))), the better, so minG is needed. It should be noted that logD(x) is an irrelevant item, and it is directly 0 when deriving.

The best case for D is:

2.3 Differences between the two distributions

For the generation network G, its input z ∼ N ( 0 , I ) means that z obeys the normal distribution of data, and through the trained parameters θ, the image generated by the generation network is G ( z , θ ).

For the discriminant network, it can be considered as a binary classification problem, one is the output of the generated network, that is, xgenerative​=G(z,θ); the other is the real data xreal​, (wherein, xreal​∼Dreal​ , means xreal​ obey a real distribution distribution). Input x (where, x=xgenerative​∪xreal​) data into the discriminant network, and the output results are:

D(xreal​,ϕ)
D(xgenerative​,ϕ)=D(G(z,θ),ϕ)

It can be described mathematically with the following formula.

  Generate the loss function of the network :L_{G}=H(1,D(G(z)))

 In the above formula, G represents the generation network, D represents the discriminant network, H represents the cross entropy, and z is the input random data. D(G(z)) is the judgment probability of the generated data, 1 means the data is absolutely true, and 0 means the data is absolutely false. H(1,D(G(z))) represents the distance between the judgment result and 1. Obviously, if the generation network wants to achieve good results, it must be done so that the discriminator can distinguish the generated data as real data (that is, the smaller the distance between D(G(z)) and 1, the better).

   The loss function of the discriminant network : L_{D}=H(1,D(x))+H(0,D(G(z)))
In the above formula, xis the real data. It should be noted here that it H(1,D(x))represents the distance between the real data and 1, and H(0,D(G(z)))represents the distance between the generated data and 0. Obviously, if the recognition network wants to achieve good results, it must do so. In its eyes, real data is real data, and generated data is false data (that is, the distance between real data and 1 is small, and the distance between generated data and 0 is small. ).

Think of it as a binary classification problem. The loss function of the binary classification problem can be expressed by the cross entropy loss function. For the binary classification, there are only positive samples (label=1) and negative samples (label=0). And the sum of the two probabilities is 1. For an input x, the model output is p(x). y is the true label. The discriminator is trained using the Binary Cross-Entropy (BCE) loss function. So the loss function of a single sample is:
LOSS = -y * log(p(x)) + (1-y)log(1-p(x))

If it is to calculate the average loss function of N samples, just add up the N Loss and divide it by N:

   Optimization principle: With the loss function of the generation network and the discriminant network, based on their respective loss functions, the error backpropagation (Backpropagation) (BP) backpropagation algorithm and optimization methods (such as gradient descent method) can be used to realize parameter optimization. Adjustment), and continuously improve the performance of the generation network and the discriminant network (the mature state of the final generation network and the discriminant network is to learn a reasonable mapping function).
 

2.4 The loss function of GAN is difficult to decrease

The purpose of the generator and the discriminator are opposite, that is to say, the two generator networks and the discriminator network are against each other, and one trades off the other. It is impossible for Loss to drop to a convergent state.

  • For the generator, its Loss drops rapidly, and it is likely that the discriminator is too weak, causing the generator to easily "fool" the discriminator.
  • For the discriminator, the Loss drops quickly, which means that the discriminator is very strong, and the strong discriminator means that the image generated by the generator is not realistic enough, which makes it easy for the discriminator to distinguish, resulting in a rapid loss of loss.

That is to say, whether it is a discriminator or a generator. The level of loss does not represent the quality of the generator. For a good GAN network, its GAN Loss is often fluctuating.

​ It seems that judging whether the model has converged can only look at the quality of the generated image. In fact, the WGAN discussed later proposes a new loss measurement method, which allows us to judge whether the model is converged by certain means.

3 GAN network architecture

Generate network code

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()

        def block(in_feat, out_feat, normalize=True):
            layers = [nn.Linear(in_feat, out_feat)]
            if normalize:
                layers.append(nn.BatchNorm1d(out_feat, 0.8))
            layers.append(nn.LeakyReLU(0.2, inplace=True))
            return layers

        self.model = nn.Sequential(
            *block(opt.latent_dim, 128, normalize=False),
            *block(128, 256),
            *block(256, 512),
            *block(512, 1024),
            nn.Linear(1024, int(np.prod(img_shape))),
            nn.Tanh()
        )

    def forward(self, z):
        img = self.model(z)
        img = img.view(img.shape[0], *img_shape)
        return img

Against the network code:

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        self.model = nn.Sequential(
            nn.Linear(int(np.prod(img_shape)), 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
        )

4 Defects of GAN

4.1 What is the difference between a generative model and a discriminative model?

For machine learning models, we can divide the models into two categories, generative models and discriminative models, according to the way the model models data. If we want to train a model for cat and dog classification, for the discriminative model, we only need to learn the difference between the two. For example, cats are smaller than dogs. The generative model is different. It needs to learn what cats look like and what dogs look like. After having the appearance of the two, distinguish them according to their appearance. in particular:

  • Generative model: learn the joint probability distribution P(X,Y) from the data, and then obtain the probability distribution P(Y|X) from P(Y|X)=P(X,Y)/P(X) as the prediction Model. This method expresses the generative relationship between a given input X and an output Y

  • Discriminant model: directly learn the decision function Y=f(X) or the conditional probability distribution P(Y|X) from the data as the prediction model, that is, the discriminant model. Discriminative methods are concerned with what output Y should be predicted for a given input X.

4.2 Loss of GAN is hard to drop

​ As a well-trained GAN, its Loss cannot be reduced. To measure whether the GAN is well trained, the human eye can only see whether the quality of the generated pictures is good. The WGAN mentioned later proposes a new Loss design method, which better solves the problem of difficulty in judging convergence.

For the discriminator, it can be seen from the Loss of GAN that the purpose of the generator and the discriminator are opposite, that is to say, the two generator networks and the discriminator network are against each other, and one ebbs and another. It is impossible for Loss to drop to a convergent state.

  • For the generator, its Loss drops rapidly, and it is likely that the discriminator is too weak, causing the generator to easily "fool" the discriminator.
  • For the discriminator, the Loss drops quickly, which means that the discriminator is very strong, and the strong discriminator means that the image generated by the generator is not realistic enough, which makes it easy for the discriminator to distinguish, resulting in a rapid loss of loss.

That is to say, whether it is a discriminator or a generator. The level of loss does not represent the quality of the generator. For a good GAN network, its GAN Loss is often fluctuating.

4.3 Solving mode collapsing

​ The so-called GAN training collapse refers to the situation where the generator and the discriminator overwhelm the other during the training process. The Loss of the original discriminator of GAN is equivalent to minimizing the JS divergence between the generated distribution and the real distribution when the discriminator reaches the optimum, because it is difficult for the randomly generated distribution to have a non-negligible overlap with the real distribution and the JS divergence The mutation characteristics of the generator make the generator face the problem of gradient disappearance; but if the discriminator is not trained to the optimum, then the goal of generator optimization will lose its meaning. Therefore, we need to carefully balance the two, and the discriminator must be trained neither badly nor badly. Otherwise, there will be a training crash and the desired result will not be obtained

Method 1: Improved method for the objective function

​ In order to avoid the above-mentioned problem of mode jumping due to optimization of maxmin, UnrolledGAN solves it by modifying the generator loss. Specifically, UnrolledGAN updates the generator k times when updating the generator, and the referenced Loss is not the loss of a certain time, but the loss of k iterations after the discriminator. Note that the next k iterations of the discriminator do not update its own parameters, and only calculate the loss to update the generator. This method allows the generator to take into account the changes of the subsequent k discriminators, avoiding the mode collapse problem caused by switching between different modes. It must be distinguished here from iterating the generator k times and then iterating the discriminator once. DRAGAN introduces the no-regret algorithm in game theory, and transforms its loss to solve the mode collapse problem.

Method 2: Improved method for network structure

​ Multi agent diverse GAN (MAD-GAN) uses multiple generators and a discriminator to ensure the diversity of sample generation. Compared with ordinary GAN, there are several more generators, and a regular term is added when designing loss. The regularization term penalizes the consistency of samples generated by the three generators using cosine distance.

MRGAN adds a discriminator to punish the mode collapse problem of generating samples. This discriminator is mainly used to judge whether the generated samples are diverse, that is, whether there is mode collapse.

Method 3: Mini-batch Discrimination

​ Mini-batch discrimination establishes a mini-batch layer in the middle layer of the discriminator to calculate the sample statistics based on the L1 distance. By establishing this statistic, it realizes how close a certain sample in a batch is to other samples. This information can be used by the discriminator to identify samples that lack diversity. For generators, try to generate samples with diversity.

4.4 Avoid GAN training crashes

  • The normalized image is input between (-1, 1); the last layer of Generator uses the tanh activation function
  • The Loss of the generator adopts: min (log 1-D). Because the original generator Loss has the problem of gradient disappearance; when training the generator, consider reversing the label, real=fake, fake=real
  • Don't sample on a uniform distribution, sample on a Gaussian distribution
  • There must be only positive samples or negative samples in a Mini-batch. Do not mix; if you can't use Batch Norm, you can use Instance Norm
  • Avoid sparse gradients, that is, use less ReLU and MaxPool. LeakyReLU can be used instead of ReLU, and downsampling can be replaced by Average Pooling or Convolution + stride. Upsampling can use PixelShuffle, ConvTranspose2d + stride
  • Smooth label or add noise to the label; smooth label, that is, for positive samples, a random number of 0.7-1.2 can be used instead; for negative samples, a random number of 0-0.3 can be used instead. Add noise to the label: that is, when training the discriminator, randomly flip the labels of some samples.
  • If possible, please use DCGAN or a hybrid model: KL+GAN, VAE+GAN.
  • Using LSGAN, WGAN-GP
  • Generator uses Adam, Discriminator uses SGD
  • Find errors as soon as possible; for example, if the discriminator Loss is 0, it means that the training has failed; if the generator Loss decreases steadily, it means that the discriminator is not working
  • Don't try to solve the model collapse problem during training by comparing the size of the generator and discriminator Loss. For example: While Loss D > Loss A: Train D While Loss A > Loss D: Train A
  • If there is a label, please try to use the label information to train
  • Add some noise to the input of the discriminator and some artificial noise to each layer of G.
  • Multi-training the discriminator, especially when noise is added
  • For the generator, dropout is used during training and testing

5 Improvements to GANs

5.1 CGAN

  One of the main disadvantages of the confrontational network is that the training process is unstable. In order to improve the stability of the training, Conditional Generative Adversarial Nets (CGAN) solves the uncertainty of the results generated by GAN to a certain extent. If the original GAN ​​is trained on the Mnist dataset, the image generated by the GAN is completely uncertain. Whether the number 1, 2, or a few is generated is completely uncontrollable. In order to make the generated numbers controllable, we can split the data set and split the data set of numbers 0~9 to train 9 models, but this is too troublesome and unrealistic. Because data set splitting is not only troublesome for classification, but more importantly, there are few samples in each category, and it is likely to cause underfitting when used to train GAN. Therefore, CGAN came into being. Let's first look at the network structure of CGAN: 

  As can be seen from the network structure diagram, for the generator Generator, its input is not only the sampling z of random noise, but also the label information of the image to be generated. For example, for mnist data generation, it is a one-hot vector, and a dimension of 1 means that a picture of a certain number is generated. Likewise, the input to the discriminator also includes the labels of the samples. This allows the discriminator and generator to learn the relationship between samples and labels. ​ The Loss design is basically the same as the original GAN, except that the input data of the generator and discriminator is a conditional distribution. In the specific programming implementation, it is only necessary to do a cascade connection between the random noise sampling z and the input condition y.

That is, by turning unsupervised GAN into a semi-supervised or supervised model , a little goal is added to the training of GAN. The optimized objective function is:
insert image description here

  CGAN introduces a conditional variable y in the modeling of both the generative model G and the discriminative model D, where y can be a label or other data forms, and y and the original input of GAN are combined into a vector as the input of CGAN. This simple and straightforward improvement proved to be very effective and was widely used in subsequent related work. A schematic diagram of the CGAN model is shown below:

5.2 DCGAN

The GANs we talked about earlier are all based on simple neural networks. However, for vision problems, if the original DNN-based GAN is used, many problems will arise. If the random noise input to GAN is 100-dimensional random noise, the output image will be 256x256 in size. That is to say, it is necessary to map 100-dimensional information into 65536 dimensions. If it is simply implemented with DNN, then the entire model parameters will be very large, and it will be very difficult to learn (a lot of information needs to be added to map from low dimension to high dimension). Therefore, the deep convolutional network DCGAN appeared.


Generating network: convolutional neural network + deconvolutional neural network (the former is responsible for extracting image features, and the latter is responsible for regenerating images (ie fake data) based on input features).
Discrimination network: convolutional neural network + fully connected layer processing (traditional neural network) (the former is responsible for extracting image features, and the latter is responsible for distinguishing true and false.)

​Specifically, DCGAN uses GAN to implement the traditional GAN ​​generator and discriminator, and uses some tricks:

  • Replace the pooling layer convolutions, where strided convolutions are used on the discriminator, and fractional-strided convolutions are used on the generator.
  • Use batchnorm on both generator and discriminator.
  • Removing the fully connected layer, global pooling increases the stability of the model, but hurts the convergence speed.
  • ReLU is used in all layers of the generator except the output layer, and tanh is used in the output layer.
  • Use LeakyReLU on all layers of the discriminator.

Combining the confrontation network and the convolutional neural network for image generation, the structure of the DCGAN model is as follows:
insert image description here

  The basic architecture of DCGANs is to use several layers of "deconvolution" (Deconvolution). The traditional CNN is to compress the size of the image and become smaller and smaller, while deconvolution is to make the initial input small data (noise) larger and larger (but deconvolution is not the reverse operation of CNN), For example, in the picture above, from the 100-dimensional noise of the input layer to the 64x64x3 image of the final output layer, a large dimension is generated from a small dimension. The schematic diagram of deconvolution is shown below. A 2x2 input image, after a 3x3 convolution kernel, can generate a 4x4 feature map:
insert image description here
  because deconvolution exists in the backpropagation of convolution . The convolution kernel matrix of back propagation is the transposition of forward propagation, so it can also be called transport convolution. It's just that we have taken the operation of backpropagation to forward propagation, which has produced the so-called deconvolution. However, transport convolution can only restore the size of the signal, not its value , so it is not a real inverse operation.
Another improvement of DCGAN is the processing of the pooling layer in the generation model. The traditional CNN uses the pooling layer (max-pooling or mean-pooling) to compress the size of the data. In the process of deconvolution, the size of the data will become larger and larger, and the process of max-pooling is not reversible, so the paper of DCGAN does not use the reverse operation of pooling, but only makes the sliding step of deconvolution Long is set to 2 or greater to make the size grow as we want. In addition, the DCGAN model uses batch normalization on both G and D, which makes the training process more stable and controllable.

  This document applies GANs to text to image (Text to Image) , so that specific images can be generated based on the content described by a specific input text. Therefore, in addition to input random noise, there are some specific natural language information in the generative model. Therefore, the discriminative model not only needs to distinguish whether the sample is real, but also determines whether it matches the input sentence information. The network structure is shown in the figure below:
insert image description here

 5.3 WRONG

In generating an adversarial network, when the network is judged to be optimal, the optimization goal of generating the network is to minimize the JS divergence between the real distribution pr​(x) and the model distribution pθ​(x). When the two distributions are the same, the JS divergence is 0, and the corresponding loss of the optimal generative network is −2log2. But one problem with using JS divergence to train GANs is that when two distributions do not overlap, the JS divergence between them is always equal to the constant log2. For generative networks, the gradient of the objective function with respect to the parameters is zero.


The Wasserstein distance is added on the basis of GAN to solve the problem that it is difficult to judge the convergence of the GAN network training process. Wasserstein distance is used to measure the distance between two distributions. The advantage over KL divergence and JS divergence is that even if the two distributions do not overlap or overlap very little, the Wasserstein distance can still reflect the distance between the two distributions. Its mathematical formula is as follows:

From the formula, GAN always seems to be confusing. In terms of code implementation, it is actually as follows:

  • The last layer of the discriminator removes sigmoid
  • The loss of the generator and the discriminator does not take the log
  • After each update of the parameters of the discriminator, their absolute values ​​are truncated to no more than a fixed constant c

 ​ The actual experiment process found that WGAN is not so easy to use, the main reason is that WAGN performs gradient truncation. Gradient truncation will cause the discriminant network to tend to a binary network, resulting in a decrease in model capacity. So the author proposes to use gradient penalty instead of gradient clipping. If desired, Layer Normalization can be selected.

5.4 LSGAN

​ LSGAN (Least Squares GAN) This article mainly made an improvement on the stability of standard GAN and the low quality of image generation. The author replaces the cross-entropy loss of the original GAN ​​with the least squares loss.

​ The actual implementation is very simple. The last layer removes the sigmoid and uses the square error when calculating the Loss.

5.5 seqGAN

​ seqGAN combines reinforcement learning to generate text under the framework of GAN.

  In the text generation task, seqGAN is different from ordinary GAN in the following points:

  • The generator does not take argmax.
  • Every time a word is generated, Monte Carlo sampling is performed according to the current word sequence to generate a completed sentence. The sentence is then sent to the discriminator to calculate the reward.
  • According to the obtained reward, the strategy gradient descent optimization model is performed.

Deep learning - summary and comparison of current mainstream GAN principles:

http://t.csdn.cn/odHM7

6 Application of GAN

If your training data is insufficient, no problem. GANs can augment your dataset by taking known data and generating synthetic images.
Generate images from descriptions (text-to-image synthesis).
Increase the resolution of your video to capture finer details (from low to high resolution).
In audio, GANs can also be used to synthesize high-fidelity audio or perform speech translation.
 

6.1 Image generation

  The most commonly used place for GAN is image generation, such as super-resolution tasks, semantic segmentation and so on.

6.2 Data Augmentation

  Use GAN-generated images for data augmentation,

6.3 Image translation

Guess you like

Origin blog.csdn.net/qq_40379132/article/details/131363180