torch_12_BigGAN full interpretation

1. Summary:

Despite the recent generation model pictures made progress, successfully generated high-resolution picture, but focus on complex data, the diversity of the sample remains elusive target. This paper attempts to generate combat training on a large scale network, and study the instability in this scale. We found that the regularization is applied to quadrature generator so that it can adapt to a simple "truncation processing, allows to reduce the variance of input generator to control the trade-off between fine sample fidelity and variations.

 

2. The three contributions:

 

1. The increase in the size of the GAN can significantly improve the modeling results, the model parameter is increased 2-4 times higher than before the training batch size increases by 8 times. This paper presents two simple yet have a general framework for improving can improve contractility model, and improve a regularization strategy to improve conditioning, proved that such methods can improve performance.

 

2. As a by-product of these strategies, the proposed model becomes more truncated obedience skills. Cut technique is a simple method of sampling can be done explicitly, grained control sample fidelity between diversity.

 

3. found that the large-scale GAN unstable reason, they are the empirical analysis, and further, the authors found that the combination of existing and new tricks to reduce the use of such instability, but complete training can only be achieved at the cost of a huge performance

 

 

3. Background

 

Generated by a network against a builder composition and a discriminator, the former effect is to generate realistic random sample according to the random noise data, whose role is to identify the sample is a real or generated by the generator. In the earliest version of the optimization target when GAN training to achieve maximum value - minimum Nash equilibrium problem. When an image classification tasks, G and D are generally convolutional neural network. If you do not use a number of complementary techniques to increase the stability, the training process is very fragile, poor convergence, it is necessary to select ultra-fine design parameters and network structure to ensure results.

 

Recent work has focused on modifying the original GAN ​​algorithm, making it more stable. One idea is to modify the training objective function to ensure convergence. Another idea is punished by a gradient technique or a normalized limit D, Both methods are in use against the unbounded objective function to ensure any G, D can provide gradient.

 

Very relevant to our work is the spectral normalization, it forces the D lipschitz continuity, which is estimated by dynamic when they run the largest singular value, this parameter D is normalized and the realizable. Moreover, further comprising inverse dynamics, the singular values ​​of the main direction adaptive regularization. [1] Analysis of the ratio G Jacques matrix condition number, the number of conditions found to depend on this performance GAN. Document [2] was found to be turned into a normalized spectrum G can improve the stability, so that each training algorithm capable of reducing the number of iterations of the iterative D. In this paper, these methods are further analyzed in order to understand the mechanism of GAN training.

 

Other work focused on selecting a network structure. SA-GAN increased module called self-attention to enhance G, the ability of D. ProGAN training image by using a single model gradually increasing the resolution.

 

GAN gan conditions for the type of information increases, and as the input generator discriminator, the class information may be encoded using one hot, random noise, and stitching together the G input. Further, the batch may be added normalization layer. In the class joined the category information.

4. Evaluation:

 

Generation picture quality

 

Generated picture of diversity

 

IS:

 

 

 

 

 

If only one picture is generated, the generated image IS still very high, which is a drawback of IS. The FID compared to the IS is more suitable for generating the picture quality evaluation, and IS different, the smaller the value of diversity means better quality of generated images and richer picture. When all of the generated image is only one category, its value will be high.

 

 

 

5. Increase the size of the GAN

 

This article focuses on increasing the scale model, increasing the size of the batch to achieve performance improvement. As the base model, the choice of the SA-GAN, using hing loss objective function. By Class G condition normalized to provide class information, class information provided by the D Projection. [4] The settings and optimization of the same document, using the G spectral normalization. G for each iteration, D two iterations. For evaluation purposes, using the weight of the weight G of the moving average by orthogonal initialization, calculation bathNorm statistics on all devices in G as a unit, rather than the computing device.

 

Start by increasing the batch size for the baseline model, find this a great advantage. OF inferred that this is because after the batch size is increased, the data in each batch mode to cover more, to provide better and generate bench discriminator. However, simply increasing the batch size has brought a very significant side effects, fewer cases of iterations, but the effect is better, if it continues training, the training process is no longer stable, and gradient collapse phenomenon.

 

Next, we each layer increased by 50% of the width (number of channels), parameters of the model roughly doubled, leading to elevate IS% 21, increasing depth, has not improved, to solve this problem is to adopt a different residual block structure.

 

The class embedded c applied to the conditional batchnorm layer G contains a large amount of weight, herein selected shared class embedded linear projection to the gains and biases of the individual layers, at reduced computational requirements and memory requirements, the train speed increase 37 %, in addition, the level of latent variables used space, which would be fed to the noise vector G in the plurality of layers, not just the input layer. This method may be used such that different features of the different levels of resolution is a direct impact noise vector.

 

6. Use truncation techniques do compromise between authenticity and diversity

 

(Improvement in the input noise)

 

Input generator used generally normal or uniform distribution of random numbers, truncation technique employed herein, the random number is too distributions were truncation processing, such a method was found that the best results. This intuitive explanation is that the greater the range of random numbers change if the random noise of the network input, the greater the resulting sample variation on a standard template, so the stronger the diversity of the sample, but the truth may be reduced. It is too distribution N (0,1) with the first random number generating truncated noise vector Z, which would be if the random number exceeds the defined range, the re-sampling, so that it falls in this range. This practice is called truncation techniques, the truncated vector z, modulus over a specified random number threshold resampling, so that a single sample can improve the quality, but the cost of reduced diversity in the sample. Some of the larger truncated model is not suitable when the input truncation noise generated saturation. To solve this problem, attempts to adjust the smoothness to enhance the adaptability of G truncated, so that the entire space is mapped to the z good output samples.

 

 

 

 

 

7. Summary

 

实验结果发现,现有的GAN技巧已经足以训练大规模的,分布式的,大批量的模型。尽管这样,训练时还是容易坍塌,实现时,需要提前终止。

 

8.分析

 

8.1生成器的不稳定性

 

本文着重对小规模时稳定,大规模时不稳定的问题进行分析。在训练的过程中监测一系列权重,梯度和损失,寻找一种可能预示训练崩溃开始的指标,我们发现每个权重矩阵的前三个奇异值信息量是最大的,可以使用Alrnoldi迭代法有效的计算它们,该方法扩展Miyato等人的使用的幂迭代法到附加奇异向量和值的估计。但是在有些层表现不好,其谱规范在训练过程中不断增长,并且在崩溃时爆炸。为了确定这种病理是否是崩溃的原因或仅仅是一种症状,我们研究了对G施加额外条件作用以明确对抗光谱爆炸的效果。首先,我们直接将每一个权重的第一个值奇异值正则化,其次,用部分奇异值分解对最大特征值进行阶段处理,给定权重矩阵w,它的第一个奇异向量u0,v0,σclamp为σ0截断后的值。权重的更新公式:

 

 我们发现有没有谱规范都对阻止奇异值的增加和爆炸有影响,但是即使在这种情况下,提升了性能,没有阻止训练倒塌。这个证据表明条件G能提升稳定性,但是不能确保稳定性,还需要分析D。

8.2判别器的不稳定性

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/shuangcao/p/11892713.html