A Review on Generative Adversarial Networks: Algorithms, Theory and Applications

Paper address: https://arxiv.org/pdf/2001.06937.pdf

In recent years, Generative Adversarial Networks (GAN) has been a hot research topic. Since 2014, people have conducted extensive research on GAN and proposed a large number of algorithms. However, there are few comprehensive studies to explain the connections between different GAN variants and the way they evolve. In this article, we try to review a variety of GAN methods from the perspectives of algorithms, theory, and applications. First, we introduced in detail the research motivation, mathematical representation and architecture of most GAN algorithms. In addition, GAN has been combined with other machine learning algorithms in some specific applications, such as semi-supervised learning, transfer learning, and reinforcement learning. This article compares the similarities and differences of these GAN methods. Second, we studied the theoretical issues related to GAN. Third, we explained the typical applications of GAN in image processing and computer vision, natural language processing, music, speech and audio, medicine, and data science. Finally, we pointed out some future open research issues of GAN.
 

algorithm

In this section, we first introduce the most primitive GAN. Then, it introduces its representative variants, training and evaluation methods, and task-driven GAN.

Generative Adversarial Network

When the models are all neural networks, the GAN architecture is very intuitive to implement. In order to learn the distribution p_g of the generator on the data x, first define a prior distribution p_z(z)[3] about the input noise variable, where z is the noise variable. Next, GAN represents the mapping from noise space to data space G(z, θ_g), where G is a differentiable function represented by a neural network with a parameter of θ_g. In addition to G, another neural network D(x, θ_d) is also defined by the parameter θ_d, and the output of D(x) is a scalar. D(x) represents the probability that x comes from the real data and not from the generator G. We train the discriminator D to maximize the probability of providing the correct label for the training data and the fake samples generated by the generator G. At the same time, we train G to minimize log(1-D(G(z))).

  • Objective function

GAN can use a variety of different objective functions.

  • The most primitive minimax game

The objective function of GAN [3] is

Where D(x) is the cross entropy between [1, 0]^T and [D(x), 1-D(x)]^T. Similarly, log(1-D(G(z))) is the cross entropy between [0, 1]^T and [D(G(z)), 1-D(G(z))]^T. For a fixed G, the optimal discriminator D is given in [3]:

(1) The mini-max game in the formula can be reformulated as:

The KL divergence and JS divergence between two probability distributions p(x) and q(x) are defined as follows:

Therefore, (3) is equivalent to

Therefore, the objective function of GAN and KL divergence are related to JS divergence.

  • Unsaturated game

In fact, formula (1) may not provide a large enough gradient for G to learn well. Generally speaking, G has poor performance in the early stages of the learning process, and the generated samples are significantly different from the training data. Therefore, D can reject the samples generated by G with high confidence. In this case, log(1-D(G(z))) is saturated. We can train G to maximize log(D(G(z))) instead of minimizing log(1-D(G(z))). The loss of the generator becomes

This new objective function can make D and G reach the same fixed point during the training process, but provides a much larger gradient in the early stage of learning. Unsaturated games are heuristic, not theory-driven. However, there are other problems in the unsaturated game, such as the instability of the numerical gradient used to train G. Under the optimal D*_G, there are

Therefore E_(x~p_g)[-log(D*_G(x))] is equivalent to

According to (3) and (6), there are

Therefore E_(x~p_g)[log^(1-D*_G(x))] is equivalent to

Substituting Eq. (11) into Eq. (9), we can get

It can be seen from equation (12) that the optimization of the alternative G loss function in the unsaturated game is contradictory, because the first objective is to make the difference between the generated distribution and the actual distribution as small as possible, and because of the negative sign The second goal is to make the difference between these two distributions as large as possible. This will bring unstable numerical gradients for training G. In addition, KL divergence is an asymmetric metric, which can be reflected in the following two examples

The penalties for the two errors of G are completely different. The first type of error is that G produces unreal samples, and the corresponding penalty is very large. The second type of error is that G fails to produce a real sample, and the penalty is small. The first type of error is that the generated samples are inaccurate, and the second type of error is that the generated samples are not diversified enough. Based on this principle, G tends to generate repeated but safe samples, rather than risk generating different but unsafe samples, which will lead to the problem of mode collapse.

  • Maximum likelihood game

In GAN, there are many ways to approximate the formula (1). Assuming that the discriminator is optimal, we want to minimize

There are other possible methods to approach the maximum likelihood in the GAN framework [17]. Figure 1 shows the comparison between the original zero-sum game, the unsaturated game, and the maximum likelihood game.

Three observations can be obtained from Figure 1.

First, when the sample may come from the generator, that is, at the left end of the graph, both the maximum likelihood game and the original minimax game are affected by gradient dispersion, while the heuristic unsaturated game does not have this problem.

Second, there is still a problem with the maximum likelihood game, that is, almost all gradients come from the right end of the curve, which means that only a small part of the samples in each minibatch dominate the calculation of the gradient. This indicates that the method of reducing sample variance may be an important research direction to improve the performance of GAN based on maximum likelihood game.

Third, the sample variance of the heuristic-based unsaturated game is low, which may be the possible reason for its more successful application in practical applications.

M.Kahng et al. [124] proposed GAN Lab, which provides an interactive visualization tool for non-professionals to learn GAN and do experiments. Bau et al. [125] proposed an analysis framework to visualize and understand GAN.

Representative GAN variants

There are many papers related to GAN [126]-[131], such as CSGAN [132] and LOGAN [133]. In this section, we will introduce some representative GAN variants.

  1. InfoGAN
  2. ConditionalGANs(cGANs)
  3. CycleGAN
  4. f-GAN
  5. IntegralProbabilityMetrics(IPMs)
  6. LossSensitiveGAN (LS-GAN)

There is a website called "The GAN Zoo" ( https://github.com/hindupuravinash/the-gan-zoo ), which lists many variants of GAN. For more detailed information, please visit this website.

GAN training

Although there are unique solutions in theory, for many reasons [29], [32], [179], GAN training is difficult and often unstable. One of the difficulties comes from the fact that the optimal weight of GAN corresponds to the saddle point of the loss function, not the minimum point.

There are many papers on GAN training. Yadav et al. [180] used prediction methods to make GAN training more stable. [181] By using independent learning rates, two time scale update rules (TTUR) are proposed for the discriminator and generator to ensure that the model can converge to a stable local Nash equilibrium. Arjovsky [179] conducted a theoretical study to fully understand the training dynamics of GAN, analyzed why GAN is difficult to train, studied and strictly proved the saturation and instability of the loss function when training GAN, and proposed a solution A practical and theoretical direction for such problems, and new tools are introduced to study them. Liang et al. [182] believe that GAN training is a continuous learning problem [183].

One way to improve GAN training is to evaluate the empirical "symptoms" that may occur during training. These symptoms include: the generator collapses to the point that it can only generate extremely similar samples for different inputs [29]; the discriminator loss quickly converges to zero [179], and it cannot provide gradient updates for the generator; making the generator and discriminator be the same It is difficult for the model to converge [32].

We will introduce GAN training from three perspectives:

  1. Objective function
  2. Training skills
  3. Architecture

GAN evaluation index

In this section, we describe some evaluation indicators for GAN [215], [216]:

  1. InceptionScore(IS)
  2. Modescore(MS)
  3. FrechetInceptionDistance(FID)
  4. Multi-scalestructuralsimilarity(MS-SSIM)

How to choose a good evaluation index for GAN is still a difficult problem [225]. Xu et al. [219] proposed an empirical study on GAN evaluation indicators. Karol Kurach [224] conducted a large-scale research on regularization and normalization in GAN. There are other comparative studies on GAN, such as [226]. Reference [227] proposes several metrics as meta-metrics to guide researchers in choosing quantitative evaluation indicators. Appropriate evaluation indicators should distinguish real samples from generated fake samples, verify mode drop or mode collapse, and detect overfitting. Hope there will be a better way to evaluate the quality of GAN models in the future.

Task-driven GAN

This article focuses on the GAN model. At present, there is a large amount of literature on closely related fields involving specific tasks.

  1. Semi-supervised learning
  2. Transfer learning
  3. Reinforcement learning
  4. Multimodal learning

GAN has been used in the field of feature learning, such as feature selection [277], hashing [278]-[285] and metric learning [286]. MisGAN [287] can learn from incomplete data through GAN. Evolutionary GAN (Evolutionary GAN) was proposed in [288]. Ponce et al. [289] combined GAN and genetic algorithm to evolve images for visual neurons. GAN is also used for other machine learning tasks [290], such as active learning [291], [292], online learning [293], ensemble learning [294], zero-sample learning [295], [296] and multi-task learning [297].

theory

Maximum Likelihood Estimation (MLE)

Not all generative models use MLE. Some generative models do not use MLE, but can be modified to use MLE (GANs fall into this category). It can be simply proved that minimizing the KL divergence (KLD) between p_data(x) and p_g(x) is equivalent to maximizing the log likelihood when the number of samples m increases:

In order to ensure the consistency of the symbols, the model probability distribution p_θ(x) is replaced with p_g(x). For more information on MLE and other statistical estimators, see Chapter 5 of [298].

Model collapse

GANs are difficult to train, and in [26], [29] it has been observed that they are often affected by mode collapse [299], [300], where the generator learns to generate samples based on only a few data distribution patterns, and ignores them. Many other patterns (even if there are samples from missing patterns in the entire training data). In the worst case, the generator only generates a single sample (completely collapsed) [179], [301].

In this section, we first introduce two views on the collapse of the GAN model: the divergence view and the algorithm view. Then, we will introduce methods for solving model collapse by proposing new objective functions or new architectures, including objective function-based methods and architecture-based methods.

Other theoretical issues

Other theoretical issues include:

1. Has GAN really learned the distribution?

2. Divergence/distance

3. Inverse mapping

4. Mathematical point of view (such as optimization)

5. Memory

application

As mentioned earlier, GAN is a powerful generative model that can generate realistic samples from a random vector z. We neither need to know the explicit true data distribution or make any other mathematical assumptions. These advantages make GAN can be widely used in many fields, such as image processing and computer vision, sequence data, etc.

Image processing and computer vision

The most successful applications of GAN are in image processing and computer vision, such as image super-resolution, image generation and manipulation, and video processing.

  1. Super resolution
  2. Image composition and manipulation
  3. Texture synthesis
  4. Target Detection
  5. Video application

Sequence data

GAN has also made certain achievements in sequence data such as natural language, music, speech, audio [376], [377], time sequence [378]–[381], etc.

Open research questions

There are still many open research issues in the GAN field.

Using GAN for discrete data: GAN relies on the generation parameters to be completely differentiable with regard to the generated samples. Therefore, GAN cannot directly generate discrete data, such as hash codes and one-hot vectors. Solving such problems is very important because it can unleash the potential of GAN in natural language processing and hash calculation. Goodfellow proposed three methods to solve this problem [103]: use Gumbel-softmax [448], [449] or discrete distribution [450]; use reinforcement algorithm [451]; train the generator to sample which can be converted into discrete values Continuous value (for example, directly sampling the embedding vector of the word).

There are other ways to move in the direction of this research. Song et al. [278] used a continuous function to approximate the sign function of the hash value. Gulrajani et al. [19] used continuous generators to model discrete data. Hjelm et al. [452] introduced an algorithm for training GAN with discrete data. The algorithm uses the estimated difference metric from the discriminator to calculate the importance weight of the generated samples, thereby providing a strategy gradient for training the generator. You can find other related work in [453], [454]. More work is needed in this interesting area.

New divergence: Researchers have proposed a series of new integral probability measures (IPM) for training GAN, such as Fisher GAN [455], [456], mean and covariance feature matching GAN (McGan) [457] and Sobolev GAN [458]. Are there other interesting divergence categories? This is worthy of further research.

Estimated uncertainty: Generally speaking, the more data we have, the smaller the estimated uncertainty will be. GAN does not give the distribution of the generated training samples, but GAN wants to generate new samples with the same distribution as the training samples. Therefore, GAN has neither a likelihood nor a clearly defined posterior distribution. There have been preliminary attempts to study this direction, such as Bayesian GAN [459]. Although we can use GAN to generate data, how to measure the uncertainty of a trained generator? This is another interesting question worthy of future research.

Theory: Regarding the generalization problem, Zhang et al. [460] proposed a generalization boundary between the true distribution and the learned distribution under different evaluation indicators. When evaluating neural distances, the generalization boundary in [460] shows that as long as the set of discriminators is small enough, regardless of the size of the hypothesis set or generator set, generalization can be guaranteed. Arora et al. [306] proposed a novel test method that uses the "birthday paradox" of discrete probability to estimate the size of the support set, and showed that even if the image has high visual quality, GAN will be affected by mode collapse. More in-depth theoretical analysis is worth studying. How do we test generalization empirically? A useful theory should be able to choose the type, capacity, and architecture of the model. This is an interesting question worthy of in-depth study in future work.

Others: There are many other important research issues in the GAN field, such as evaluation methods (see section 3.4 for details) and model collapse (see section 4.2 for details).

Guess you like

Origin blog.csdn.net/a493823882/article/details/106949345