Triple Generative Adversarial Nets

Article download link: https://arxiv.org/pdf/1703.02291.pdf

https://arxiv.org/pdf/1703.02291v2.pdf

1. What problem did this article solve?

        Generative adversarial networks (GAN) have shown promise in image generation and semi-supervised learning (SSL).

        However, the existing GAN in SSL has two problems:

(1) The generator and discriminator (ie classifier) ​​may not be in the best state at the same time;

(2) The generator cannot control the semantics of the generated samples .

        The problem is essentially composed of two participants, where a single discriminator plays an incompatible role in identifying fake samples and predicting labels, and only estimates the data without considering the labels. To solve these problems, this paper proposes a triple-generation adversarial network (Triple-GAN), which consists of three participants: a generator, a discriminator, and a classifier. The generator and classifier characterize the conditional distribution between the image and the label, while the discriminator focuses only on identifying pairs of forged image tags.

       This article designs compatible utilities to ensure that the distributions characterized by classifiers and generators converge to the data distribution. The results on various data sets show that Triple-GAN as a unified model can simultaneously (1) obtain the latest classification results in the depth generation model , and (2) clarify the input class and style and smoothly transfer it in the data space The potential spatial interpolation in the conditional space is passed.

二、Introduction

       The depth generation model (DGM) can capture the basic distribution of data and synthesize new samples. Recently, significant progress has been made in generating realistic images based on generative adversarial networks (GAN). GAN is formulated as a two-player game, where the generator G takes random noise z as input and generates a sample G (z) in the data space, and the discriminator D determines whether a sample comes from the real data distribution p (x) or generates Device. Both G and D are parameterized as deep neural networks, and the training process is to solve the minimum and maximum problems:

        Where pz (z) is a simple distribution (such as uniform or normal), and U (·) represents utility. Given a generator and a defined distribution pg, in non-parameter settings, the best discriminator is D (x) = p (x) / (pg (x) + p (x)), and if and only if pg When (x) = p (x), this is needed in image generation.

       In general, GAN and DGM have also proved effective in semi-supervised learning (SSL), while maintaining the ability to generate. Under the same two-player game framework, Cat-GAN generalizes GAN using a classification discriminant network and an objective function that minimizes the predicted conditional entropy given real data and generates samples at a given The case maximizes the predicted conditional entropy. Odena and Salimans added classification discriminators with another type corresponding to the fake data generated by the generator. There are two main problems in the existing SSL GAN: (1) the generator and the discriminator (that is, the classifier) ​​may not be in the best state at the same time; (2) the generator cannot control the semantics of the generated samples.

       For example, for the first question, Salimans et al. Proposed two alternative training targets, both of which are suitable for classification or image generation in SSL, but not for both. The target of feature matching works well in classification, but cannot generate indistinguishable samples (for example, see Section 5.2), and another target for small batch recognition is to be able to generate real images, but cannot accurately predict labels . There is no in-depth analysis of this phenomenon in related articles. Here, we believe that they are essentially generated by the formula of two roles. One of the discriminators must play two incompatible roles-identifying fake samples and predicting labels . Specifically, suppose G is optimal, that is, p (x) = pg (x), and consider samples x ~ pg (x). On the one hand, as a discriminator, the optimal D should identify x as a pseudo-sample with a non-zero probability (see [7] for proof). On the other hand, since x ~ p (x), as a classifier, the optimal D should always reliably predict the correct classification of x. Since D has two incompatible convergence points, a conflict occurs, indicating that G and D may not be in the best state at the same time. Moreover, even in most practical situations, as long as pg (x) and p (x) overlap, even if G is not perfect, the problem still exists. Given that the samples come from overlapping regions, the two roles of D still compete through different processing of the samples, resulting in poor classifier quality. That is, the learning ability of the existing two-player game model is limited, and this problem should be solved to improve the current SSL results.

       For the second problem, it is a common concern to separate meaningful physical factors (such as object categories) from potential representations under limited supervision. However, although some jobs can learn such a representation given a complete label, none of the existing GANs can learn the representation unwrapped in SSL. Similarly, we believe that the problem was caused by their composition. Specifically, the discriminator in [26, 25] takes a single data instead of a data tag pair as input, and when proving whether the sample is true or false, the tag information is completely ignored. Therefore, the generator will not receive any learning signal about the label information from the discriminator. Therefore, such a model cannot control the semantics of the generated samples, which is unsatisfactory.

       In order to solve these problems, this paper proposes Triple-GAN. Triple-GAN is a flexible game theory framework for classification and conditional image generation in SSL, which has partially labeled data sets. We introduced two conditional networks-a classifier and a generator, to generate pseudo labels for given real data and pseudo data for real labels, respectively. In order to jointly prove the quality of the conditional network samples, this paper defines a unique discriminator network. The only function is to distinguish whether the data label pair comes from the real label data set . The resulting model is called Triple-GAN, because not only there are three networks, but this paper considers three joint distributions, namely the distribution of real data labels and the distribution defined by the conditional network (for the description of Triple-GAN, see Figure 1). Directly driven by the best ideal balance between the classifier and the condition generator, this article has carefully designed compatible utilities, including adversarial losses and unbiased regularization (see section 3), in order to solve the challenging SSL tasks provide effective solutions, theory and practice.

Figure 1: Illustration of triple GAN (color best view). The utility of D, C and G is colored with blue, green and yellow, respectively, where "R" indicates rejection, "A" indicates acceptance, and "CE" indicates cross-entropy loss. "A" and "R" are adversarial losses, "CE" is unbiased regularization, which can ensure consistency between pg, pc and p, pg, pc and p are generator, classifier and real data generation respectively Process-defined distribution.

        In particular, in theory, good classifiers will not compete as described in the first question, but will produce good generators in Triple-GAN and vice versa (see Section 3.2). In addition, the discriminator can access the label information of the unlabeled data from the classifier and then force the generator to generate the correct image label pair, which solves the second problem. Based on experience, this paper evaluates our model on the widely used MNIST [14], SVHN [19] and CIFAR10 [12] datasets. The results (see Section 5) show that Triple-GAN can learn a good classifier and condition generator at the same time, which is consistent with our motivation and theoretical results.

        In general, the main contribution of this article has two aspects: (1) This article analyzes the problems in the existing SSL GAN [26, 25], and proposes a novel Triple-GAN framework of game theory, to elaborate The compatible goal of the design solves these problems; (2) This paper proves that on three datasets with incomplete labels, Triple-GAN can greatly improve the latest classification results of DGM, at the same time, it can solve the class and style and perform class conditional interpolation . 

Three, methods

        This article considers learning DGM in a semi-supervised environment, where there is a partially labeled data set, where x represents input data and y represents output labels. The purpose is to predict the label y of unlabeled data and generate a new sample x conditional on y. This is different from the unsupervised setting of pure generation, whose only goal is to sample the data x from the generator to deceive the discriminator. Therefore, a two-player game is enough to describe the process in GAN. In the setting of this paper, because the label information y is incomplete (and therefore uncertain), the density model of this paper should describe the uncertainty of x and y, so the joint distribution p (x, y) of the input label pair.

         Because y lacks a value, the two-person GAN cannot be directly applied. Unlike the previous work [26, 25], the latter is limited to a two-player game framework and may lead to incompatible goals. Based on the insight that the joint distribution can be decomposed in two ways, we construct our game theory goals. p (x, y) = p (x) p (y | x) and p (x, y) = p (y) p (x | y), and the conditional distribution p (y | x) and p (x | y) Interested in classification and class condition generation respectively. In order to jointly estimate these conditional distributions characterized by a classifier network and a classification condition generator network, a unique discriminator network is defined in this paper, whose sole purpose is to distinguish whether the sample comes from the real data distribution or from the model. Therefore, this article naturally extends GAN to Triple-GAN, which is a three-layer game used to describe the process of classification and class condition generation in SSL.

 4. Experiment

        Now, this paper presents the results on the widely used MNIST [14], SVHN [19] and CIFAR10 [12] datasets. MNIST consists of 50,000 training samples, 10,000 verification samples and 10,000 handwritten digit test samples with a size of 28 × 28. SVHN is composed of 73,257 training samples and 26,032 test samples, each of which is a color image with a size of 32 × 32, which contains digital sequences with different backgrounds. CIFAR10 consists of color images, which are distributed in 10 general categories: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats and trucks. There are 50,000 training samples and 10,000 test samples of size 32 × 32 in CIFAR10. If necessary, this article validates 5,000 SVHN and CIFAR10 training data. On CIFAR10, this paper performs ZCA on the input of C according to [13], but still uses G and D to generate and estimate the original image.

       

      

 

V. Conclusion

        This article introduces Triple-GAN (Triple-GAN), a unified game theory framework composed of three participants (generator, discriminator, and classifier) ​​for semi-supervised learning using compatible utilities. Using such utility programs, Triple-GAN solves two main problems of existing methods [26, 25]. Specifically, Triple-GAN ensures that both the classifier and the generator can achieve their respective optimal values ​​from the perspective of game theory, and enables the generator to sample specific types of data. The empirical results of this paper on the MNIST, SVHN and CIFAR10 data sets show that as a unified model, Triple-GAN can simultaneously achieve the latest classification results between depth generation models, and unravel the styles and classes, and can be interpolated in the latent space .

Related other articles explain: https://segmentfault.com/a/1190000022263719/

                                https://blog.csdn.net/u011961856/article/details/77605933

Published 128 original articles · Like 132 · Visits 170,000+

Guess you like

Origin blog.csdn.net/yql_617540298/article/details/105316170