Generative against network GAN (Generative Adversarial Nets) Introduction

1 Introduction

This article from the basic "Generative Adversarial Nets" translated summary.
GAN (Generative Adversarial Nets), the formula against the network. It comprises two models, a generation model G, used to capture the data distribution, a D recognition model, used to evaluate samples from the training data, rather than the possibility of G.

Both models G and D is a competitive relationship, hostile relations. For example, to generate a model like the G in the manufacture of fake currency, identified model D is like the police, to try to detect counterfeit money. Competition between these two models so that they are constantly improve themselves, until money can not distinguish between counterfeit and genuine so far.

Examples of this paper, two training model uses only the back propagation algorithm and dropout, to spread before the sampling from the resulting model, used only. Approximate Reasoning or Markov chain is unnecessary.

2. Related Methods

(1) Non-oriented model and FIG directed graphical models, such as restrictions Boltzmann machine (RBMs), Boltzmann machine depth (DBMS), and related variants. These models use the normalization process, all states for random parameters. These normalized proportional function and their gradients are difficult to calculate, although may be employed Markov chain Monte Carlo (MCMC) methods for estimation.
(2) DBNs (Deep belief networks ) are combined model, comprising a non-oriented layer and an oriented stack of layers. When training a fast approximate layer by layer exists, DBNs when a collection is non-directional and directional model will face a difficult computational problems.
(3) generating a random network GSN, using a Markov chain. GAN does not need the Markov chain, as in the generation, no back-propagation.

3. confrontation Network

p (z): the noisy input;
G (Z, [theta]): a differentiable function, with parameters [theta] Multilayer Perceptron a;
D (x): x represents the data from the generating function or from probability.
Overall training function as follows, maximize D, with a G for minimizing the log (1-D (G ( z))):
Here Insert Picture Description

Algorithm is as follows:
Here Insert Picture Description

3.1 Proposition 1

If G is fixed, the best D as follows:

Here Insert Picture Description
Certify as follows:

Here Insert Picture Description

3.2 Theorem 1

C (G) = max V ( G, D).
Equal distribution function p only generate original data and the data of the data distribution function p, C (G) training is completed, its value is equal -log4.

3.3 convergence algorithm

Generating a data distribution function p converges to the distribution function of the data Data p.

Here Insert Picture Description

Indeed, the network generates only limited data to generate the distribution function P, because of the function G (z, θ), we finally be optimized [theta], instead of generating a data distribution function optimization P.

4. Experimental Results

Estimation using Parzen window, is applied to the sample G generated.
Kernel density estimation (kernel density estimation) is used to estimate the unknown density function, is one of the non-parametric test methods in probability theory, (1962) proposed by Rosenblatt (1955) and Emanuel Parzen, also known as Parzen window (Parzen window) .
Here Insert Picture Description

Example generated, such as the rightmost image in FIG generation network is generated.
Here Insert Picture Description

5. Summary

1 can be used for semi-supervised learning, such as when fewer tag data, wherein the identifier or network inference can improve the classification results obtained.
2. Training efficiencies to know the G and D can effectively enhance the training speed.

Published 21 original articles · won praise 18 · views 1448

Guess you like

Origin blog.csdn.net/zephyr_wang/article/details/105037867