What is the significance of the development of GAN for the study of general artificial intelligence?

Author: Lyken
Link: https://www.zhihu.com/question/57668112/answer/155367561
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

The significance of GAN for artificial intelligence can be started from the three parts of its name : Generative A dversarial Networks . In order to facilitate the description, and also to recall the time spent on a forum in the past two weeks, I will start with Networks.

Networks: (Deep) Neural Networks

Since the birth of AlexNet in 2012, neural networks have become the mainstream of learning. Compared with the strong priori hypothesis (priori) of the Bayesian school, the repeated study of SVM on the kernel function (kernel), the neural network does not require researchers to pay too much attention to details, only need to provide a large amount of data and set hyperparameters , can achieve good results. In the way of martial arts novels, the masters of various sects have devoted more than ten years to practice the magical skills such as one yang finger / nine yin scripture / unicorn arm. During the test, they found that there is a nameless soldier whose internal strength is as vast as the sea, although there is no way to make a move. He said, but with the blessing of internal strength, everyone could not lift their heads.

The Deep series of algorithms not only topped many benchmarks, but its derivative applications have also brought a new wave of artificial intelligence, such as creating artworks (Gatys' Neural Alorightm for Artistic Style), AlphaGo (CNN valuation + Monte Carlo pruning), high-quality machine translation (Attention + seq2seq), and more. These derivative applications are already comparable to human experts in some tasks, which makes people think about the arrival of strong artificial intelligence (strong AI). However , no matter how powerful Deep Neural Networks are, they have their own limitations, and the unsatisfactory generative model is one of them.

Generative (Model): Generative model

Machine learning models can be roughly divided into two categories, generative models and discriminative models. A discriminative model requires input variables xto be predicted by some kind of model p(y|x). A generative model is to randomly generate observed data given some implicit information. As a simple example,

  • Discriminant model: Given a picture, determine whether the animal in this picture is a cat or a dog
  • Generative model: Given a series of cat pictures, generate a new cat (not in the dataset)

The well-known imagenet-1000 image classification, image semantic segmentation for automatic driving, and prediction of human skeleton points are all discriminative models, that is, a certain feature is predicted for a given input. In fact, most of the work in 12 to 14 years belongs to the discriminative model. One of the reasons is that the loss function (loss) of the discriminant model is easy to define.

Back to the roots, what is machine learning? To sum it up in one sentence , give feedback during the training process so that the results are close to our expectations . For the classification problem (classification), we hope that the loss will not change after it is close to the bound, so we choose the cross entropy (Cross Entropy) as the feedback; in the regression problem (regression), we hope that the loss will only be in one of the two It remains unchanged when it touches the same, so the Euclidean distance (MSE) between the points is chosen as the feedback. The choice of the loss function (feedback) will obviously affect the quality of the training results and is the top priority in designing the model. In the past five years, there have been hundreds of variants of neural networks, but very few loss functions. For example, in the official documentation of caffe, only eight standard loss functions are provided Caffe | Layer Catalogue .

For discriminative models, the loss function is easy to define because the target of the output is relatively simple. But for generative models, the definition of the loss function is not so easy. For example, for the generation of sentences in NLP, although there is an excellent measure of BLEU, it cannot be put into model training due to the difficulty of derivation; for the task of generating cat pictures, if the loss function is simply defined as "and has been Euclidean distance with pictures", then the result will be a weird mix of pictures in the database, the effect is terrible. When we want the neural network to draw a cat, we obviously want the picture to have an animal outline, textured hair, and a domineering look, rather than the cold Euclidean distance optimal solution. How do we train our model about our cat's expectations? This is the problem that the Adversarial part of GANs solves.

Adversarial: confrontation (mutual confrontation)

As mentioned in the generative section, our expectations for cats (generating results) are often ambiguous and difficult to define mathematically axiomically. But wait, when it comes to dealing with ambiguous, difficult-to-axiomatic problems, isn’t that the same as the discriminative task mentioned earlier? For example, image classification, the probability distribution model of a bunch of RGB pixels and the last N categories, obviously cannot be defined from a traditional mathematical point of view. So why not give the feedback part of the generative model to the discriminative model ? This is Goodfellow 's genius idea - he brought together the two major categories of models in machine learning, Generative and Discrimitive .

Model list


<img src="https://pic3.zhimg.com/50/v2-5dfe9e846a1ad37160dfdad80f0b784c_hd.jpg" data-rawwidth="384" data-rawheight="453" class="content_image" width="384">

The adversarial generative network is mainly composed of the generation part G and the discriminative part D. The training process is described as follows

  1. Input noise (hidden variable)with
  2. By generating the part Gto getx_{fake}=G(z)
  3. Take a part of the real data from the real data setx_{real}
  4. mix the twox=x_{fake} + x_{real}
  5. Feed the data into the discriminant part D, given the labels x_{fake}=0, x_{real}=1(simple binary classifier)
  6. According to the classification result, return loss

In the whole process, try to use it Das much as possible (Huoyan Jingjing, if you kill it well, you will not miss it). And it is necessary to make , that is, to make the generated pictures as real as possible. The entire training process is like two players fighting each other, which is where the name Adversarial comes from. In the paper [1406.2661] Generative Adversarial Networks , Goodfellow theoretically proves the convergence of the algorithm, and when the model converges, the generated data has the same distribution as the real data (guaranteeing the model effect).D(G(z))=0D(x_{real})=1GD(G(z))=1

From a research perspective, GAN provides a new training idea for many generative models, which has spawned many follow-up works. For example, customize the two-dimensional girl (escape) according to your own preferences, generate corresponding description pictures according to the text ( Newmu/dcgan_code , hanzhanggit/StackGAN ), and even use tags to generate 3D IKEA home models ( zck119/3dgan-release ), these works have no effect. Not amazing. At the same time, it is invaluable that this paper has a strong mathematical argument, which is different from the results of the models in previous years, but theoretically guarantees the reliability of the model. Although the current training still encounters difficulties from time to time, follow-up updates have been made to improve this problem (WGAN, Loss Sensetive GAN, Least Square GAN), and I believe it will be overcome one day.

From the high-level point of view of general artificial intelligence, this model is the first to use neural networks to guide neural networks. Around, discussing with each other, and gradually evolved a cognition of the outside world. Isn't this the ultimate intelligence that we expect - the knowledge source of machines is no longer limited to humans, but can communicate with each other and learn from each other. No wonder Yann Lecun praised GAN as the most interesting idea in machine learning in the last decade pytorch

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324541752&siteId=291194637