GAN review the history of the most complete version of 2020: Algorithms, Theory and Application

Since 2014 Ian GoodFellow GAN proposed model, generated against the network quickly became the hottest generative model. Today, the new algorithm GAN-based design mushroomed have emerged out for in-depth analysis of issues emerging theory GAN existing models such as the collapse of convergence and its applications are widely penetrated into such as computer vision, natural language processing, medical artificial intelligence and other fields. This article is a detailed review of GAN hand Amoy large cattle drive from Yejie Ping and other teacher, introduced the research progress in recent years about GAN model, and pointed out the direction of future development in this field.

Papers address: https: //arxiv.org/pdf/2001.06937.pdf

In recent years, generating confrontation Network (GAN) is a hot research topic. So far in 2014, people GAN conducted extensive research, and made a number of algorithms. However, there are few comprehensive studies are needed to explain the link between the different variants of GAN and how they evolved. In this paper, we try to carry out a review of the various methods from the perspective of GAN algorithms, theory and applications. First, we studied in detail the motivation of most GAN algorithms, mathematical characterization and architecture. In addition, GAN has particular application in some other machine learning algorithms combined as semi-supervised learning, reinforcement learning and transfer learning. This paper compares the similarities and differences of these GAN methods. Secondly, we study the theoretical issues related to the GAN. Third, we describe the image processing and computer vision, natural language processing, music, speech and audio, medical and scientific data in typical applications in GAN. Finally, we pointed out some of the future of open-label study of the problem GAN.

algorithm

In this section, we first introduce the most original GAN. Then, it describes a representative variant, as well as for training and evaluation task-driven mode GAN.

Generation network confrontation

When the models are neural networks, GAN architecture to achieve very intuitive. In order to learn the distribution p_g generator on the data x is first defined on a prior distribution p_z input noise variable (z) [. 3], where z is the noise variance. Next, the map showing the GAN G (z, θ_g) from noise to data space, where G is a parameter of the neural network representation θ_g differentiable function. In addition to G, another neural network D (x, θ_d) define parameter θ_d, the output D (x) is a scalar. D (x) represents x from real data rather than the probability of G from the generator. We train the discriminator D, to maximize the training data and the generator G generates a probability of false samples to provide the correct label. Meanwhile, we train G, minimizing the log (1-D (G (z))).

  • The objective function

GAN can use a variety of different objective functions.

  • The most primitive Minimax Game

GAN [3] The objective function

Where D (x) is [1, 0] ^ T and [D (x), 1 - D (x)] ^ between the cross-entropy T. Likewise, log (1-D (G (z))) is [0, 1] ^ T and [D (G (z)), 1 - D (G (z))] ^ cross-entropy between T. For a fixed G, [3] gives the best discriminator D:

(1) Game minimax formula may be re-formulated as:

Two probability distributions p (x) and q KL divergence and JS divergence is defined between the (x) as follows:

Accordingly, (3) is equivalent to the formula

Therefore, GAN's objective function JS and KL divergence and divergence are related.

  • Unsaturated Game

In fact, the formula (1) may not provide a sufficiently large gradients make it a good learning G. In general, G poor performance in the early learning process, there are significant differences in the samples and the training data. Thus, D may be rejected samples G generated with high confidence. In this case, log (1-D (G (z))) is saturated. We can train G to maximize the log (D (G (z))), instead of minimizing the log (1-D (G (z))). Loss generator then becomes

This new objective function can achieve the same fixed point in the training process, to D and G, but in early learning can provide a much larger gradient. Unsaturated game is heuristic rather than theory-driven. However, there are other unsaturated game problems, such as instability to the gradient values ​​G of training. In the optimal D * _G down there

Thus E_ (x ~ p_g) [- log (D * _G (x))] is equivalent to

According to (3) and (6), there is

Thus E_ (x ~ p_g) [log ^ (1 - D * _G (x))] is equivalent to

The (11) is substituted into the formula (9) can be obtained

As can be seen from (12), G unsaturated alternative optimization game loss function are contradictory, since the first objective is to make the difference between the actual distribution and the distribution generator as small as possible, but due to the negative sign the existence of the second objective is to make the difference between these two distributions as large as possible. This training will bring value G gradient instability. In addition, KL divergence asymmetric metric, this can be reflected in the following two examples

G penalty of two errors are completely different. The first error is the sample G had a false, corresponding to a great punishment. The second error is the G failed to produce real samples, while punishment is very small. The first is to generate error samples is not accurate, and the second sample error is enough to generate diversity. Based on this principle, G tends to produce duplicate samples but safe, rather than take risks to generate samples of different but insecure, which can lead to the collapse mode (mode collapse) problem.

  • Maximum Likelihood Game

In GAN, there are many ways to approximate equation (1). Assume discriminator is the best, we want to minimize

There are other possible methods the maximum likelihood approach [17] in the GAN frame. Figure 1 shows for the original zero-sum game, unsaturated game and compare the maximum likelihood of the game.

It can be obtained by the three observation FIG.

First, when the sample may come from the generator, i.e. the left end in FIG., The maximum likelihood Minimax original game and the game is affected by the diffusion gradient, the unsaturated game heuristic This problem does not exist.

Second, Maximum Likelihood game there is a problem that almost all come from the right end of the gradient of the curve, which means that each minibatch in only a very small part of the sample dominated the computing gradients. This suggests that methods to reduce the sample variance may be important to improve research performance based on the maximum likelihood GAN's game.

Third, the low sample variance based on unsaturated Game heuristic, which may be more likely reason for its success in practical applications.

M.Kahng et al. [124] proposed GAN Lab, non-professionals to learn and experiment with GAN provides an interactive visualization tools. Bau et al [125] presented an analytical framework to visualize and understand GAN.

Representative variants GAN

And GAN [126] - [131] There are many related papers, e.g. CSGAN [132] and LOGAN [133]. In this section, we will describe some representative GAN variants.

  1.  InfoGAN

  2.  ConditionalGANs(cGANs)

  3.  CycleGAN

  4.  f-BEING

  5.  IntegralProbabilityMetrics(IPMs) 

  6. LossSensitiveGAN (LS-GAN)

There is one called "The GAN Zoo" Web site (https://github.com/hindupuravinash/the-gan-zoo), lists a number of GAN variants. For more information, visit the website.

GAN training

Despite the unique solution in theory, but for various reasons [29], [32], [179], GAN training is difficult and often unstable. One of the difficulties derived from the fact: that the optimal weight of GAN weight loss corresponds to a saddle point of the function, rather than minima.

Many papers on GAN training. Yadav et al [180] Training GAN predicted that the method is more stable. [181] By using a separate learning rate, two time scales proposed update rule (TTUR) is determined and the generator to ensure that the model may converge to a stable local Nash equilibrium. Arjovsky [179] were trained to fully understand the dynamics of GAN (training dynamics) explored the theoretical analysis of why GAN hard training, research and strictly proved that the loss function that appears when training GAN saturation and instability problems, proposed solutions a practical and theoretical basis of the direction of such problems, and introduced new tools to study them. Liang et al. [182] GAN think training is a continuous learning problems [183].

One way to improve the training is to assess GAN training that may occur in empirical "symptoms." These symptoms include: collapse generator to only generate very similar samples [29] for different input; loss discriminator converges rapidly to zero [179] does not provide a gradient generator updates; enable generator, this discriminator difficult to converge the model [32].

We will introduce three angles GAN training:

  1. The objective function

  2. Training Tips

  3. Architecture

GAN's evaluation

In this section, we illustrate some of the evaluation for the GAN [215], [216]:

  1. InceptionScore(IS)

  2. Modescore(MS)

  3. FrechetInceptionDistance(FID) 

  4. Multi-scalestructuralsimilarity(MS-SSIM) 

How to choose a good evaluation index for the GAN is still a problem [225]. Xu et al. [219] proposed an empirical study on the evaluation of GAN. Karol Kurach [224] of the GAN regularization and normalization large-scale study was carried out. There are other GAN for comparative studies, such as [226]. Ref. [227] proposed several metrics as a measure of yuan to guide the researchers chose a quantitative evaluation. Proper evaluation should be separated from real samples and generate false sample area, verify mode drop (mode drop) or pattern collapse, and detect over-fitting. Hope in the future there will be a better way to evaluate the quality of GAN model.

Task-driven GAN

This article focuses attention GAN model. Currently, a closely related field related to a specific task, there are already a lot of literature.

  1. Semi-supervised learning

  2. Transfer learning

  3. Reinforcement Learning

  4. Multi-modal learning

GAN has been used to study the art wherein, for example, feature selection [277], hash [278] - [285] and learning metric [286]. MisGAN [287] can learn to use incomplete data by GAN. [288] proposed evolutionary GAN (Evolutionary GAN). Ponce et al [289] and GA GAN binding neurons visual image evolution. GAN is also used for other machine learning tasks [290], for example, active learning [291], [292], online learning [293], ensemble learning [294], the zero learning sample [295], [296] Multi-Task Learning and [297].

theory

Maximum likelihood estimation (MLE)

Not all models are generated using the MLE. Some model generated without using MLE, but can be modified using MLE (GAN fall into this category). The log-likelihood can be easily demonstrated, KL divergence (KLD) between minimizing p_data (x) and p_g (x) is equivalent to maximizing the number of samples increases the m:

In order to ensure consistency of notation, the probability distribution model p_θ (x) is replaced p_g (x). For more information about the MLE estimates and other statistics, see [298] Chapter 5.

Pattern collapse

GAN hard training, and [26], [29] has been observed that they are often subject to collapse mode [299] Effects [300], wherein the generating learning sample according to generate only a few data distribution mode, while ignoring a number of other models (even if there throughout the training data samples from the missing pattern). In the worst case, only a single sample generator generates (completely collapsed) [179], [301].

In this section, we first introduce two views GAN mode of collapse: the divergence of views and opinions algorithm. Then, we will introduce by proposing a new objective function or method to address the new architecture model collapsed, including methods based on the objective function and method based architecture.

Other theoretical issues

Other theories include:

1. GAN really learn whether the distribution?

2. divergence / distance

3. inverse mapping

4. mathematical point (e.g., optimized)

5. Memory

application

As described above, GAN is a powerful model can be generated by a random vector z to generate realistic sample. We do not need to know explicitly the real data distribution, or any other mathematical assumptions. These advantages make GAN can be widely applied in many fields, such as image processing and computer vision, the sequence data.

Image processing and computer vision

GAN is the most successful application in image processing and computer vision, for example, super-resolution image, and video image generation processing operation.

  1. Super-Resolution

  2. And image synthesis operation

  3. Texture synthesis

  4. Target Detection

  5. Video Applications

Sequence data

GAN also on the sequence data made some achievements such as natural language, music, voice, audio, [376], [377], the time series [378] - [381] and so on.

Open research question

GAN field there are still many open research questions.

The discrete data for GAN: GAN depends on parameters concerning the generation generate a sample is completely differentiable. Thus, the GAN discrete data can not be generated directly, for example, and one hot encoding a hash (one-hot) vector. To solve this problem is very important because it can release GAN potential in natural language processing and hash calculation. Goodfellow three methods proposed to solve this problem [103]: using the Gumbel-softmax [448], [ 449] or a discrete distribution [450]; with the reinforced algorithm [451]; train generator to be converted into a discrete sample values continuous values (e.g., directly embedded vector word sampling).

There are other ways towards the development of the research. Song et al [278] was used to approximate a continuous function of the sign function value of the hash. Gulrajani et al [19] using a continuous modeling discrete data generator. Hjelm et al [452] introduces a GAN training data using discrete algorithm to calculate the importance of the right using the estimated weight of the sample to generate difference metric from the discriminator, so as to provide a training strategy for the gradient generator. Yes, the [454] to find other related work in [453]. In this interesting areas need more work occurs.

The new divergence: researchers have proposed a new set of integral probability measure (IPM) training for GAN, such as Fisher GAN [455], [456 ], mean and covariance feature matching GAN (McGan) [457] and Sobolev GAN [458]. Are there any other interesting divergence category? This deserves further study.

The uncertainty of estimates: Generally speaking, the more data we have, the less uncertainty is estimated. GAN will not give distributed generation of training samples, but GAN want a new sample generation and distribution of training samples of the same. Therefore, GAN neither likelihood there is no clear definition of the posterior distribution. There are already on the initial attempt in this direction of research, such as Bayesian GAN [459]. Although we can use the GAN generate data, but how to measure uncertainty trained generator do? This is another interesting questions worthy of future research.

Theory: About generalization problem, Zhang et al. [460] proposed a generalization of the boundary between the real distribution at different evaluation and learning to distribution. When evaluated by nervous from generalization bound [460] is indicated that as long as the discriminator set is sufficiently small, no matter what the size of the assumed generator set or set of generalized are guaranteed. Arora et al. [306] proposed a novel test method, using a discrete probability "birthday paradox" to estimate the size of the support set, and shows that even with high visual image quality, GAN will also be affected pattern collapse. More in-depth study of the theoretical analysis well worth it. How do we test empirically generalization? Useful theory should be able to select a category, capacity and architecture model. This is an interesting question worthy of further research work in the future.

Published 232 original articles · won praise 93 · views 50000 +

Guess you like

Origin blog.csdn.net/qq_42370150/article/details/104756594