《Gans in Action》第一章 对抗神经网络介绍

此为《Gans in Action》(对抗神经网络实战)第一章读书笔记

Chapter 1. Introduction to GANs 对抗神经网络介绍

This chapter covers

  • An overview of Generative Adversarial Networks
  • What makes this class of machine learning algorithms special
  • Some of the exciting GAN applications that this book covers

本章内容包括:GAN概述、GAN的特别之处以及GAN的应用

The notion of whether machines can think is older than the computer itself. In 1950, the famed mathematician, logician, and computer scientist Alan Turing—perhaps best known for his role in decoding the Nazi wartime enciphering machine, Enigma—penned a paper that would immortalize his name for generations to come, “Computing Machinery and Intelligence.”

In the paper, Turing proposed a test he called the imitation game, better known today as the Turing test. In this hypothetical scenario, an unknowing observer talks with two counterparts behind a closed door: one, a fellow human; the other, a computer. Turing reasons that if the observer is unable to tell which is the person and which is the machine, the computer passed the test and must be deemed intelligent.

图灵机的提出,门后是电脑和人,另一测试者跟他们交谈,无法区分人与电脑时,则认为电脑有了智能。

Anyone who has attempted to engage in a dialogue with an automated chatbot or a voice-powered intelligent assistant knows that computers have a long way to go to pass this deceptively simple test. However, in other tasks, computers have not only matched human performance but also surpassed it—even in areas that were until recently considered out of reach for even the smartest algorithms, such as superhumanly accurate face recognition or mastering the game of Go.[1]

[1]:See “Surpassing Human-Level Face Verification Performance on LFW with GaussianFace,” by Chaochao Lu and Xiaoou Tang, 2014, https://arXiv.org/abs/1404.3840. See also the New York Times article “Google’s AlphaGo Defeats Chinese Go Master in Win for A.I.,” by Paul Mozur, 2017, http://mng.bz/07WJ.

尽管人工智能还有很长的路要走,但在某些方面的能力已经超越人类,比如人脸识别和围棋。

Machine learning algorithms are great at recognizing patterns in existing data and using that insight for tasks such as classification (assigning the correct category to an example) and regression (estimating a numerical value based on a variety of inputs). When asked to generate new data, however, computers have struggled. An algorithm can defeat a chess grandmaster, estimate stock price movements, and classify whether a credit card transaction is likely to be fraudulent. In contrast, any attempt at making small talk with Amazon’s Alexa or Apple’s Siri is doomed. Indeed, humanity’s most basic and essential capacities—including a convivial conversation or the crafting of an original creation—can leave even the most sophisticated supercomputers in digital spasms.

之前机器学习算法擅长已有数据的分类和回归任务,但对于生成新的数据表现不佳。

This all changed in 2014 when Ian Goodfellow, then a PhD student at the University of Montreal, invented Generative Adversarial Networks (GANs). This technique has enabled computers to generate realistic data by using not one, but two, separate neural networks. GANs were not the first computer program used to generate data, but their results and versatility set them apart from all the rest. GANs have achieved remarkable results that had long been considered virtually impossible for artificial systems, such as the ability to generate fake images with real-world-like quality, turn a scribble into a photograph-like image, or turn video footage of a horse into a running zebra—all without the need for vast troves of painstakingly labeled training data.

直到2014年,博士生 Ian Goodfellow提出了生成对抗网络(GAN)。GAN是由两个神经网络组成,在产生新数据方面具有很好的通用性也被广泛应用

A telling example of how far machine data generation has been able to advance thanks to GANs is the synthesis of human faces, illustrated in figure 1.1. As recently as 2014, when GANs were invented, the best that machines could produce was a blurred countenance—and even that was celebrated as a groundbreaking success. By 2017, just three years later, advances in GANs enabled computers to synthesize fake faces whose quality rivals high-resolution portrait photographs. In this book, we look under the hood of the algorithm that made all this possible.

一个比较好的例子是人脸图像合成,如图1.1所示,GAN能够生成高分辨率的图像

Figure 1.1. Progress in human face generation

在这里插入图片描述

(Source: “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation,” by Miles Brundage et al., 2018, https://arxiv.org/abs/1802.07228.)

1.1. What are Generative Adversarial Networks? 什么是GAN

Generative Adversarial Networks (GANs) are a class of machine learning techniques that consist of two simultaneously trained models: one (the Generator) trained to generate fake data, and the other (the Discriminator) trained to discern the fake data from real examples.

GAN包含生成器和识别器,前者生成虚假图像,后者把虚假图像识别出来。

The word generative indicates the overall purpose of the model: creating new data. The data that a GAN will learn to generate depends on the choice of the training set. For example, if we want a GAN to synthesize images that look like Leonardo da Vinci’s, we would use a training dataset of da Vinci’s artwork.

生成:生成器生成训练数据集类似的数据,比如用达芬奇的作品作为训练集,合成达芬奇风格的图像

The term adversarial points to the game-like, competitive dynamic between the two models that constitute the GAN framework: the Generator and the Discriminator. The Generator’s goal is to create examples that are indistinguishable from the real data in the training set. In our example, this means producing paintings that look just like da Vinci’s. The Discriminator’s objective is to distinguish the fake examples produced by the Generator from the real examples coming from the training dataset. In our example, the Discriminator plays the role of an art expert assessing the authenticity of paintings believed to be da Vinci’s. The two networks are continually trying to outwit each other: the better the Generator gets at creating convincing data, the better the Discriminator needs to be at distinguishing real examples from the fake ones.

对抗:生成器努力生成以假乱真的图像,识别器努力识别出真假来,两者就像造假者与鉴假者一样,互相对抗

Finally, the word networks indicates the class of machine learning models most commonly used to represent the Generator and the Discriminator: neural networks. Depending on the complexity of the GAN implementation, these can range from simple feed-forward neural networks (as you’ll see in chapter 3) to convolutional neural networks (as you’ll see in chapter 4) or even more complex variants, such as the U-Net (as you’ll see in chapter 9).

网络:生成器和识别器一般由两个神经网络构成,可以是前馈神经网络(第三章)卷积神经网络(第四章)、以及更复杂额变种,例如U-Net(第九章)

1.2. How do GANs work? GAN工作原理

The mathematics underpinning GANs are complex (as you’ll explore in later chapters, especially chapters 3 and 5); fortunately, many real-world analogies can make GANs easier to understand. Previously, we discussed the example of an art forger (the Generator) trying to fool an art expert (the Discriminator). The more convincing the fake paintings the forger makes, the better the art expert must be at determining their authenticity. This is true in the reverse situation as well: the better the art expert is at telling whether a particular painting is genuine, the more the forger must improve to avoid being caught red-handed.

GAN的数学知识比较复杂,这里用达芬奇作品造假者与鉴假专家的比喻比较形象。生成器(造假)与识别器(鉴假)的能力,在训练过程中是相互促进提升的。

Another metaphor often used to describe GANs—one that Ian Goodfellow himself likes to use—is that of a criminal (the Generator) who forges money, and a detective (the Discriminator) who tries to catch him. The more authentic-looking the counterfeit bills become, the better the detective must be at detecting them, and vice versa.

另一个比喻是造假钞者与警探的例子。

In more technical terms, the Generator’s goal is to produce examples that capture the characteristics of the training dataset, so much so that the samples it generates look indistinguishable from the training data. The Generator can be thought of as an object recognition model in reverse. Object recognition algorithms learn the patterns in images to discern an image’s content. Instead of recognizing the patterns, the Generator learns to create them essentially from scratch; indeed, the input into the Generator is often no more than a vector of random numbers.

以上是专业表达,生成器输入是随机向量,训练过程中捕获训练数据特征,生成真假难辨的样本;识别器是捕获训练数据特征,用以识别假样本。

The Generator learns through the feedback it receives from the Discriminator’s classifications. The Discriminator’s goal is to determine whether a particular example is real (coming from the training dataset) or fake (created by the Generator). Accordingly, each time the Discriminator is fooled into classifying a fake image as real, the Generator knows it did something well. Conversely, each time the Discriminator correctly rejects a Generator-produced image as fake, the Generator receives the feedback that it needs to improve.

The Discriminator continues to improve as well. Like any classifier, it learns from how far its predictions are from the true labels (real or fake). So, as the Generator gets better at producing realistic-looking data, the Discriminator gets better at telling fake data from the real, and both networks continue to improve simultaneously.

如果识别器识别对了,识别器就知道自己做对了,生成器就会收到反馈进行自我提升。反之亦然。

Table 1.1 summarizes the key takeaways about the two GAN subnetworks.

在这里插入图片描述

1.3. GANs in action GAN实战

Now that you have a high-level understanding of GANs and their constituent networks, let’s take a closer look at the system in action. Imagine that our goal is to teach a GAN to produce realistic-looking handwritten digits. (You’ll learn to implement such a model in chapter 3 and expand on it in chapter 4.) Figure 1.2 illustrates the core GAN architecture.

图1.2描述了GAN核心架构

Figure 1.2. The two GAN subnetworks, their inputs and outputs, and their interactions

在这里插入图片描述

Let’s walk through the details of the diagram:

1. Training dataset— The dataset of real examples that we want the Generator to learn to emulate with near-perfect quality. In this case, the dataset consists of images of handwritten digits. This dataset serves as input (x) to the Discriminator network.
2. Random noise vector— The raw input (z) to the Generator network. This input is a vector of random numbers that the Generator uses as a starting point for synthesizing fake examples.
3. Generator network— The Generator takes in a vector of random numbers (z) as input and outputs fake examples (x*). Its goal is to make the fake examples it produces indistinguishable from the real examples in the training dataset.
4. Discriminator network— The Discriminator takes as input either a real example (x) coming from the training set or a fake example (x*) produced by the Generator. For each example, the Discriminator determines and outputs the probability of whether the example is real.
5. Iterative training/tuning— For each of the Discriminator’s predictions, we determine how good it is—much as we would for a regular classifier—and use the results to iteratively tune the Discriminator and the Generator networks through backpropagation:

  • The Discriminator’s weights and biases are updated to maximize its classification accuracy (maximizing the probability of correct prediction: x as real and x* as fake).
  • The Generator’s weights and biases are updated to maximize the probability that the Discriminator misclassifies x* as real.

1 表示真实训练数据,作为识别器输入 x x
2 表示随机向量,作为生成器输入 z z ,用于产生虚假图像
3 表示生成器网络,输入 z z ,输出虚假图像 x x^*
4 表示识别器网络,将真实图像 x x 与虚假图像 x x^* 作为输入,输出图像为真实图像的可能性。
5 表示迭代训练/调参,

1.3.1. GAN training GAN训练

Learning about the purpose of the various GAN components may feel like looking at a snapshot of an engine: it cannot be understood fully until we see it in motion. That’s what this section is all about. First, we present the GAN training algorithm; then, we illustrate the training process so you can see the architecture diagram in action.

我们先了解算法,再通过训练过程来理解GAN。

GAN training algorithm
GAN训练算法

For each training iteration do

	1. Train the Discriminator:
		1. Take a random real example x from the training dataset.
		2. Get a new random noise vector z and, using the Generator network, synthesize a fake example x*.
		3. Use the Discriminator network to classify x and x*.
		4. Compute the classification errors and backpropagate the total error to update the Discriminator’s trainable parameters, 	seeking to minimize the classification errors.
	2. Train the Generator:
		1. Get a new random noise vector z and, using the Generator network, synthesize a fake example x*.
		2. Use the Discriminator network to classify x*.
		3. Compute the classification error and backpropagate the error to update the Generator’s trainable parameters, seeking to maximize the Discriminator’s error.
End for
循环开始
	1. 训练识别器:
		1. 从训练数据随机获取真实样本x
		2. 产生随机噪声向量z,使用对抗网络生成假样本x*
		3. 使用识别器对x和x*进行分类
		4. 计算分类损失,反向传播总损失更新识别器参数, 以减少分类损失
	2. 训练生成器:
		1. 获取随机噪声向量z,使用生成器网络,合成假样本 x*
		2. 使用识别器对x*进行分类
		3. 计算分类损失,反向传播更新生成器参数, 以增大识别器损失
循环结束

GAN training visualized
GAN训练图解
Figure 1.3 illustrates the GAN training algorithm. The letters in the diagram refer to the list of steps in the GAN training algorithm.

在这里插入图片描述

1.3.2. Reaching equilibrium 达到平衡

You may wonder when the GAN training loop is meant to stop. More precisely, how do we know when a GAN is fully trained so that we can determine the appropriate number of training iterations? With a regular neural network, we usually have a clear objective to achieve and measure. For example, when training a classifier, we measure the classification error on the training and validation sets, and we stop the process when the validation error starts getting worse (to avoid overfitting). In a GAN, the two networks have competing objectives: when one network gets better, the other gets worse. How do we determine when to stop?

Those familiar with game theory may recognize this setup as a zero-sum game—a situation in which one player’s gains equal the other player’s losses. When one player improves by a certain amount, the other player worsens by the same amount. All zero-sum games have a Nash equilibrium, a point at which neither player can improve their situation or payoff by changing their actions.

GAN reaches Nash equilibrium when the following conditions are met:

  • The Generator produces fake examples that are indistinguishable from the real data in the training dataset.
  • The Discriminator can at best randomly guess whether a particular example is real or fake (that is, make a 50/50 guess whether an example is real).

达到纳什均衡的时候停止训练,需满足如下条件:

  • 难以区分生成器产生的假图片和训练数据集的真图片
  • 识别器对图片真假识别的概率都是50%

NOTE
Nash equilibrium is named after the American economist and mathematician John Forbes Nash Jr., whose life story and career were captured in the biography titled A Beautiful Mind and inspired the eponymous film.

Let us convince you of why this is the case. When each of the fake examples (x*) is truly indistinguishable from the real examples (x) coming from the training dataset, there is nothing the Discriminator can use to tell them apart from one another. Because half of the examples it receives are real and half are fake, the best the Discriminator can do is to flip a coin and classify each example as real or fake with 50% probability.

The Generator is likewise at a point where it has nothing to gain from further tuning. Because the examples it produces are already indistinguishable from the real ones, even a tiny change to the process it uses to turn the random noise vector (z) into a fake example (x*) may give the Discriminator a cue for how to discern the fake example from the real data, making the Generator worse off.

上面描述了达到纳什均衡时,识别器和生成器都难以更进一步

With equilibrium achieved, GAN is said to have converged. Here is when it gets tricky. In practice, it is nearly impossible to find the Nash equilibrium for GANs because of the immense complexities involved in reaching convergence in nonconvex games (more on convergence in later chapters, particularly chapter 5). Indeed, GAN convergence remains one of the most important open questions in GAN research.

Fortunately, this has not impeded GAN research or the many innovative applications of generative adversarial learning. Even in the absence of rigorous mathematical guarantees, GANs have achieved remarkable empirical results. This book covers a selection of the most impactful ones, and the following section previews some of them.

实际中是很难达到纳什均衡(GAN收敛),这是当前GAN研究中亟待解决的问题之一。但这并不妨碍GAN在研究应用中取得非凡的成就

1.4. Why study GANs? 为什么研究GAN

Since their invention, GANs have been hailed by academics and industry experts as one of the most consequential innovations in deep learning. Yann LeCun, the director of AI research at Facebook, went so far as to say that GANs and their variations are “the coolest idea in deep learning in the last 20 years.”[2]
[2]:See “Google’s Dueling Neural Networks Spar to Get Smarter,” by Cade Metz, Wired, 2017, http://mng.bz/KE1X.

GAN自发明以来,一直被学术界和业界专家誉为深度学习领域最重要的创新之一。Facebook人工智能研究主管Yann LeCun甚至表示,GAN及其变体是“近20年来深度学习中最酷的想法”

The excitement is well justified. Unlike other advancements in machine learning that may be household names among researchers but would elicit no more than a quizzical look from anyone else, GANs have captured the imagination of researchers and the wider public alike. They have been covered by the New York Times, the BBC, Scientific American, and many other prominent media outlets. Indeed, it was one of those exciting GAN results that probably drove you to buy this book in the first place. (Right?)

GAN为科研人员和吃瓜群众提供了足够的创新空间,被各大媒体报道

Perhaps most notable is the capacity of GANs to create hyperrealistic imagery. None of the faces in figure 1.4 belongs to a real human; they are all fake, showcasing GANs’ ability to synthesize images with photorealistic quality. The faces were produced using Progressive GANs, a technique covered in chapter 6.

GAN能生成超真实的图像,难以想象,图1.4全是假的。这是使用渐进生成对抗网络生成的,第六章会提到

Figure 1.4. These photorealistic but fake human faces were synthesized by a Progressive GAN trained on high-resolution portrait photos of celebrities.

在这里插入图片描述

(Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” by Tero Karras et al., 2017, https://arxiv.org/abs/1710.10196.)

Another remarkable GAN achievement is image-to-image translation. Similarly to the way a sentence can be translated from, say, Chinese to Spanish, GANs can translate an image from one domain to another. As shown in figure 1.5, GANs can turn an image of a horse into an image of zebra (and back!), and a photo into a Monet-like painting—all with virtually no supervision and no labels whatsoever. The GAN variant that made this possible is called CycleGAN; you’ll learn all about it in chapter 9.

GAN的另一应用是图像转换,如图1.5所示,GAN将马的图像变成斑马(或者反过来)、将图像变成Monet风格。这是由CycleGAN实现的,第九章会提到

Figure 1.5. By using a GAN variant called CycleGAN, we can turn a Monet painting into a photograph or turn an image of a zebra into a depiction of a horse, and vice versa.

在这里插入图片描述

(Source: See “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” by Jun-Yan Zhu et al., 2017, https://arxiv.org/abs/1703.10593.)

The more practically minded GAN use cases are just as fascinating. The online giant Amazon is experimenting with harnessing GANs for fashion recommendations: by analyzing countless outfits, the system learns to produce new items matching any given style.[3] In medical research, GANs are used to augment datasets with synthetic examples to improve diagnostic accuracy.[4] In chapter 11—after you’ve mastered the ins and outs of training GANs and their variants—you’ll explore both of these applications in detail.
[3]: See “Amazon Has Developed an AI Fashion Designer,” by Will Knight, MIT Technology Review, 2017, http://mng.bz/9wOj.
[4]: See “Synthetic Data Augmentation Using GAN for Improved Liver Lesion Classification,” by Maayan Frid-Adar et al., 2018, https://arxiv.org/abs/1801.02385.

GAN被亚马逊用于服装设计,也被用于医疗研究提高诊断准确性。在第十一章有相关内容

GANs are also seen as an important stepping stone toward achieving artificial general intelligence,[5] an artificial system capable of matching human cognitive capacity to acquire expertise in virtually any domain—from motor skills involved in walking, to language, to creative skills needed to compose sonnets.
[5]: See “OpenAI Founder: Short-Term AGI Is a Serious Possibility,” by Tony Peng, Synced, 2018, http://mng.bz/j5Oa. See also “A Path to Unsupervised Learning Through Adversarial Networks,” by Soumith Chintala, f Code, 2016, http://mng.bz/WOag.

GAN被视为实现通用人工智能的重要基石。

But with the ability to generate new data and imagery, GANs also have the capacity to be dangerous. Much has been discussed about the spread and dangers of fake news, but the potential of GANs to create credible fake footage is disturbing. At the end of an aptly titled 2018 piece about GANs—“How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos”—the New York Times journalists Cade Metz and Keith Collins discuss the worrying prospect of GANs being exploited to create and spread convincing misinformation, including fake video footage of statements by world leaders. Martin Giles, the San Francisco bureau chief of MIT Technology Review, echoes their concern and mentions another potential risk in his 2018 article “The GANfather: The Man Who’s Given Machines the Gift of Imagination”: in the hands of skilled hackers, GANs can be used to intuit and exploit system vulnerabilities at an unprecedented scale. These concerns are what motivated us to discuss the ethical considerations of GANs in chapter 12.

GAN被用于制造虚假图片、视频信息以及网络攻击,让人感到忧虑。关于这些考虑,会在第十二章提到

GANs can do much good for the world, but all technological innovations have misuses. Here the philosophy has to be one of awareness: because it is impossible to “uninvent” a technique, it is crucial to make sure people like you are aware of this technique’s rapid emergence and its substantial potential.

科技是把双刃剑,我们无法阻止它到来,那么就认识它的潜力并让其造福世界吧

In this book, we are only able to scratch the surface of what is possible with GANs. However, we hope that this book will provide you with the necessary theoretical knowledge and practical skills to continue exploring any facet of this field that you find most interesting.

So, without further ado, let’s dive in!

本书只探索了GAN的冰山一角,希望能给你提供必要的知识技能,让你继续探索感兴趣的领域。言归正传,我们开始吧!

Summary 总结

  • GANs are a deep learning technique that uses a competitive dynamic between two neural networks to synthesize realistic data samples, such as fake photorealistic imagery. The two networks that constitute a GAN are as follows:
    • The Generator, whose goal is to fool the Discriminator by producing data indistinguishable from the training dataset
    • The Discriminator, whose goal is to correctly distinguish between real data coming from the training dataset and the fake data produced by the Generator
  • GANs have extensive applications across many different sectors, such as fashion, medicine, and cybersecurity.
  • GAN是通过两个互相竞争的神经网络来合成逼真的数据样本,例如图像。它包含两部分:生成器和识别器。
  • GAN在很多领域有广泛的应用,例如时尚、医学和网络安全
发布了154 篇原创文章 · 获赞 349 · 访问量 71万+

猜你喜欢

转载自blog.csdn.net/Leytton/article/details/103552810