CVPR 2018摘要:第五部分

标题
What’s In a Face (CVPR in Review V)
CVPR 2018摘要:第五部分​
by 啦啦啦 2
01

What’s In a Face (CVPR in Review V)

I have said that she had no face; but that meant she had a thousand faces…
― C.S. Lewis, Till We Have Faces

Today we present to you another installment where we dive into the details about a few papers from the CVPR 2018 (Computer Vision and Pattern Recognition) conference. We’ve had four already: about GANs for computer vision, about pose estimation and tracking for humans, about synthetic data, and, finally, about domain adaptation. In particular, in the fourth part we presented three papers on the same topic that had actually numerically comparable results.

Today, we turn to a different problem that also warrants a detailed comparison. We will talk about face generation, that is, about synthesizing a realistic picture of a human face, either from scratch or by changing some features of a real photo. Actually, we already touched upon this problem a while ago, in our first post about GANs. But since then, generative adversarial networks (GANs) have been one of the very hottest topics in machine learning, and it is no wonder that new advances await us today. And again, it is my great pleasure to introduce Anastasia Gaydashenko with whom we have co-authored this text.

人脸有什么( CVPR 摘要第五部分)

我说她没有面孔; 但那意味着她有一千个面孔......
- C.S. Lewis,直到我们面对面

今天我们向你介绍另一部分,我们将深入了解CVPR 2018(计算机视觉和模式识别)会议的一些论文的细节。 我们已经有四个:关于计算机视觉的GAN关于人类的姿势估计和跟踪关于合成数据,以及最后关于域适应。 特别在第四部分中,我们提出了三篇关于同一主题的论文,这些论文实际具有数字可比性。

今天,我们转向一个不同的问题,也需要进行详细的比较。 我们将讨论面部生成,即从头开始或通过改变真实照片的某些特征来合成人脸的真实图像。 实际上,我们刚刚在关于GAN的第一篇文章中已经触及了这个问题。 但从那时起,生成对抗网络(GAN)一直是机器学习中最热门的话题之一,难怪今天有新的进步等待着我们。 再次,我很高兴介绍Anastasia Gaydashenko,我们与他们共同撰写了这篇文章。

by 老赵 2
02

GANs for Face Synthesis and the Importance of Loss Functions

We have already spoken many times about how important a model’s architecture and a good dataset are for deep learning. In this post, one recurrent theme will be the meaning and importance of loss functions, that is, the functions that a neural network actually represents. One could argue that the loss function is a part of the architecture, but in practice we usually think about them separately; e.g., the same basic architecture could serve a wide variety of loss functions with only minor changes, and that is something we will see today.

We chose these particular papers because we liked them best, but also because they are all using GANs and are all using them to modify pictures of faces while preserving the person’s identity. This is a well-established application of GANs; classical papers such as ADD used it to predict how a person changes with age or how he or she would look like if they had a different gender. The papers that we consider today bring this line of research one step further, parceling out certain parts of a person’s appearance (e.g., makeup or emotions) in such a way that it can become subject to manipulations.

Thus, in a way all of today’s papers are also solving the same problem and might be comparable with each other. The problem, though, is that the true evaluation of a model’s results basically could be done only by a human: you need to judge how realistic the new picture looks like. And in our case, the specific tasks and datasets are somewhat different too, so we will not have a direct comparison of the results, but instead we will extract and compare new interesting ideas.

On to the papers!

合成面部的GAN和损失函数的重要性

我们已经多次谈到模型架构和良好的数据集对深度学习的重要性。 在这篇文章中,一个反复出现的主题将是损失函数的意义和重要性,即神经网络实际代表的函数。 有人可能会说损失函数是架构的一部分,但在实践中我们通常会分开考虑它们; 例如,相同的基本架构可以提供各种各样的损失函数,只需要很小的改动,这就是我们今天将要看到的。

我们之所以选择这些特别的论文,不仅是因为我们最喜欢它们,还因为它们都使用GAN,并且都在使用它们来修改面部图片,同时保留了人的身份。 这是GAN的成熟应用; 像ADD这样的经典论文用它来预测一个人如果随着年龄的变化而变化,或者如果他们拥有不同的性别,他们会是怎样的面部。 我们今天考虑的论文使这一系列研究更进了一步,以一种可能受到操纵的方式将一个人的外表(例如,化妆或情绪)的某些部分分开。

因此在某种程度上,今天的所有论文也解决了同样的问题,并且可能相互比较。 但问题是,对模型结果的真实评估基本上只能由人来完成:你需要判断新图片的真实程度。 在我们的例子中,具体任务和数据集也有所不同,因此我们不会直接比较结果,而是提取和比较新的有趣想法。

一起看论文吧。

by 老赵 2
03

Towards Open-Set Identity Preserving Face Synthesis

The authors of the first paper, a joint work of researchers from the University of Science and Technology of China and Microsoft Research (full pdf), aim to disentangle identity and attributes from a single face image. The idea is to decompose a face’s representation into “identity” and “attributes” in such a way that identity corresponds to the person, and attributes correspond to basically everything that could be modified while still preserving identity. Then, using this extracted identity, we can add attributes extracted from a different face. Like this:



Fascinating, right? Let’s investigate how do they do it. There are quite a few novel interesting tricks in the paper, but the main contribution of this work is a new GAN-based architecture:



面向开集身份保持人脸合成

第一篇论文的作者,来自中国科学技术大学和微软研究院研究人员的共同工作(完整pdf),旨在从单个面部图像中解开身份和属性。 这个想法是将一个面部的表示分解为“身份”和“属性”,使身份对应人,属性基本上对应于在保留身份的同时可以修改的所有内容。 然后,使用提取的身份标识,我们可以添加从不同面部提取的属性。 像这样:




很有意思吧? 我们来研究他们是如何做到的。 本文中有许多有趣的小技巧,但这项工作的主要贡献是一个新的基于GAN的架构:




by 老赵 2
04

Here the network takes as input two pictures: the identity picture and the attributes picture that will serve as the source for everything except the person’s identity: pose, emotion, illumination, and even the background.

The main components of this architecture include:

  • identity encoder I that produces a latent representation (embedding) of the identity input xˢ;

  • attributes encoder A that does the same for the attributes input xᵃ;

  • mixed picture generator G that takes as input both embeddings (concatenated) and produces the picture x’ that is supposed to mix the identity of xˢ and the attributes of xᵃ;

  • identity classifier C checks whether the person in the generated picture x’ is indeed the same as in xˢ;

  • discriminator D that tries to distinguish real and generated examples to improve generator performance, in the usual GAN fashion.

This is the structure of the model used for training; when all components have been trained, for generation itself it suffices to use only the part inside the dotted line, so the networks C and D are only included in the training phase.

在这里,网络将两张图片作为输入:身份图片,和作为除了人的身份之外的所有事物的来源:姿势,情感,照明,甚至背景的属性图片。

该架构的主要组成部分包括:

  • 身份编码器 I 产生身份输入 xˢ 的潜在表示(嵌入);

  • 属性编码器 A 对输入 xᵃ 的属性执行相同的操作;

  • 混合图像生成器 G 将嵌入(连接)作为输入并产生图像 x',该图像应该混合 xˢ 的身份和 xᵃ 的属性;

  • 身份分类器 C 检查生成的图片 x' 中的人是否确实与 xˢ 中的人相同;

  • 鉴别器 D 试图区分真实和生成的例子,以通常的GAN方式提高生成器性能。

这是用于训练的模型结构; 当所有成分都经过训练时,为了生成本身,只需使用虚线内的部分就足够了,因此网络 C 和 D 仅包含在训练阶段。

by 老赵 2
05

The main problem, of course, is how to disentangle identity from attributes. How can we tell the network what it should take from xˢ and what from xᵃ? The architecture outlined above does not answer this question by itself, the main work here is done by a careful selection of loss functions. There are quite a few of them; let us review them one by one. The NeuroNugget format does not allow for too many formulas, so we will try to capture the meaning of each part of the loss function:

  • the most straightforward part is the softmax classification loss Lᵢ that trains identity encoder I to recognize the identity of people shown on the photos; basically, we train I to serve as a person classifier and then use the last layer of this network as features fᵢ(xs);

  • the reconstruction loss Lᵣ is more interesting; we would like the result x’ to reconstruct the original image xᵃ anyway but there are two distinct cases here:

  • if the person on image xᵃ is the same as on the identity image xs, there is no question what we should do: we should reconstruct xᵃ as exactly as possible;

  • and if xᵃ and xˢ show two different people (we know all identities on the supervised training phase), we also want to reconstruct xa but with a lower penalty for “errors” (10 times lower in the authors’ experiments); we don’t actually want to reconstruct xᵃ exactly now but still want x’ to be similar to xᵃ;

  • the KL divergence loss Lkl is intended to help the attributes encoder Aconcentrate on attributes and “lose” the identity as much as possible; it serves as a regularizer to make the attributes vector distribution similar to a predefined prior (standard Gaussian);

  • the discriminator loss Lᵈ is standard GAN business: it shows how well Dcan discriminate between real and fake images; however, there is a twist here as well: instead of just including discriminator loss Lᵈ the network starts by using Lᵍᵈ, a feature matching loss that measures how similar the features extracted by D on some intermediate level from x’ and xa are; this is due to the fact that we cannot expect to fool D right away, the discriminator will always be nearly perfect at the beginning of training, and we have to settle for a weaker loss function first (see the CVAE-GAN paper for more details);

  • and, again, the same trick works for the identity classifier C; we use the basic classification loss Lᶜ but also augment it with the distance Lᵍᶜ between feature representations of x’ and xˢ on some intermediate layer of C.

主要问题是如何从属性中分离出身份。 我们怎样才能告诉网络应该在 xˢ 应该采取什么措施以及

xᵃ  ? 上面概述的体系结构本身并没有回答这个问题,这里的主要工作是通过仔细选择损失函数来完成的。 它们中有不少; 让我们逐一筛选。 NeuroNugget 格式不允许太多公式,因此我们将尝试捕获损失函数的每个部分的含义:

  • 最直接的部分是 softmax 分类损失 Lᵢ ,它训练身份编码器 I 识别照片上显示的人的身份; 基本上,我们训练 I 作为人物分类器,然后使用该网络的最后一层作为特征 fᵢ(xs);

  • 重建损失 Lᵣ 更有趣; 我们希望结果 x’ 无论如何重建原始图像 xᵃ 但这里有两个不同的情况:

  • 如果图像 xᵃ 上的人与身份图像 xs 上的人相同,毫无疑问我们应该做什么:我们应该尽可能精确地重建 xᵃ ;

  • 如果 xᵃ 和 xˢ 显示两个不同的人(我们知道监督训练阶段的所有身份),我们也想重建 xa ,但对“错误”的惩罚较低(在作者的实验中低10倍); 我们实际上并不想现在完全重建 xᵃ 但仍希望 x’ 与 xᵃ 相似;

  • KL分歧损失 Lkl 旨在帮助属性编码器 A 注意属性并尽可能“丢失”身份; 它作为一个正则化器,使属性向量分布类似于预定义的先验假设(标准高斯);

  • 鉴别器损失 Lᵈ 是标准的GAN:它显示了 D能够区分真实和假图像; 然而,这里也有一个转折:不仅仅包括鉴别器损失 Lᵈ ,网络开始使用Lᵍᵈ,一个特征匹配损失,用于衡量 D 在 x’ 和 xa 的某个中间层上提取的特征有多相似; 这是因为我们不能指望立即愚弄D,在训练开始时鉴别器总是接近完美,我们必须首先解决较弱的损失函数(有关详细信息,请参阅CVAE-GAN论文);

  • 并且,同样的技巧适用于身份分类器 C ; 我们使用基本分类损失 Lᶜ ,但也用 C 的某个中间层上 x’ 和 xˢ 的特征表示之间的距离 Lᵍᶜ 来增加它。


by 老赵 2
06

(Disclaimer: I apologize for slightly messing up notation from the pictures but Medium actually does not support sub/superscripts so I had to make do with existing Unicode symbols.)

That was quite a lot to take in, wasn’t it? Well, this is how modern GAN-based architectures usually work: their final loss function is usually a sum of many different terms, each with its own motivation and meaning. But the resulting architecture works out very nicely; we can now train it in several different ways:

  • first, networks I and C are doing basically the same thing, identifying people; therefore, they can share both the architecture and the weights (which simplifies training), and we can even use a standard pretrained person identification network as a very good initialization for I and C;

  • next, we train the whole thing on a dataset of images of people with known identities; as we have already mentioned, we can pick pairs of xˢ and xᵃ as different images of the same person and have the network try to reconstruct xa exactly, or pick xˢ and xᵃ with different people and train with a lower weight of the reconstruction loss;

  • but even that is not all; publicly available labeled datasets of people are not diverse enough to train the whole architecture end-to-end, but, fortunately, it even allows for unsupervised training; if we don’t know the identity we can’t train I and C, so we have to ignore their loss functions, but we can still train the rest! And we have already seen that I and C are the easiest to train, so we can assume they have been trained well enough on the supervised part. Thus, we can simply grab some random faces from the Web and add them to the training set without knowing the identities.

(声明:我为略微弄乱图片中的符号而道歉但是Medium实际上不支持子/上标,所以我不得不使用现有的Unicode符号。)

这是相当多的东西,不是吗?这就是现代基于GAN的架构通常的工作方式:它们的最终损失函数通常是许多不同术语的总和,每个术语都有自己的动机和意义。 但是由此产生的结构非常好; 我们现在可以用几种不同的方式训练它:

  • 首先,网络 I 和 C 基本上做同样的事情,识别人; 因此,他们可以共享架构和权重(这简化了训练),我们甚至可以使用标准的预训练人员识别网络作为 I 和 C 的非常好的初始化;

  • 接下来,我们将整个事物训练成具有已知身份的人的图像数据集; 正如我们已经提到的,我们可以成对选择

xˢ 和 xᵃ 作为同一个人的不同图像,并让网络尝试精确地重建 xa,或者用不同的人选择 xˢ 和 xᵃ 并以较低的重建损失进行训练;

  • 但即便如此也不是全部; 公开提供的人员标签数据集不够多样化,无法对端到端的整个架构进行训练,但幸运的是,它甚至允许无人监督的训练; 如果我们不知道我们不能训练 I 和 C 的身份,那么我们必须忽略他们的损失功能,但我们仍然可以训练剩下的。我们已经看到 I 和 C 是最容易训练的,所以我们可以假设他们在受监督的部分训练得很好。 因此,我们可以简单地从Web抓取一些随机面,并在不知道身份的情况下将它们添加到训练集中。

  • by 老赵 2
    07

    Thanks to the conscious and precise choice of the architecture, loss functions, and the training process, the results are fantastic! Here are two selections from the paper. In the first, we see transformations of faces randomly chosen from the training set with random faces for attributes:



    And in the second, the identities never appeared in the training set! These are people completely unknown to the network (“zero-shot identities”, as the paper calls them)… and it still works just fine:



    由于有意识和精确地选择了架构,损失函数和训练过程,结果非常棒。以下是论文中的两个选项。 在第一个中,我们看到从训练集中随机选择的面部变换,其中包含属性的随机面部:




    而在第二,身份从未出现在训练集中。 这些是网络完全不为人知的人(“零标识身份”,正如论文所称)…它仍然可以正常工作:




    by 老赵 2
    08

    PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup

    This collaboration of researchers from Princeton, Berkeley, and Adobe (full pdf) works in the same vein as the previous paper but tackles a much more precise problem: can we add/modify the makeup on a photograph rather than all attributes at once, while keeping the face as recognizable as possible. A major problem here is, as it often happens in machine learning, with the data: a relatively direct approach would be quite possible if we had a large dataset of aligned photographs of faces with and without makeup… but of course we don’t. So how do we solve this?

    The network still gets two images as an input: the source image from which we take the face and the reference image from which we take the makeup style. The model then produces the corresponding output; here are some sample results, and they are very impressive:



    This unsupervised learning framework relies on a new model of a cycle-consistent generative adversarial network; it consists of the two asymmetric functions: the forward function encodes example-based style transfer, whereas the backward function removes the style. Here is how it works:



    成对循环GAN:用于应用和删除化妆的非对称风格转移

    来自普林斯顿,伯克利和Adobe的研究人员(完整的pdf)的合作与前一篇论文的工作方式相同,但解决了更为精确的问题:我们可以在照片上添加/修改化妆而不是同时修改所有属性,而保持脸部尽可能可识别。 这里的一个主要问题是,在机器学习中经常发生的数据:如果我们有一个大型数据集对齐有和没有化妆的面部照片,那么相对直接的方法是很有可能的…但当然我们没有。 那么我们如何解决这个问题呢?

    网络仍然获得两个图像作为输入:我们从中获取面部的源图像和我们采用化妆风格的参考图像。 然后模型产生相应的输出; 这里有一些示例结果,它们非常令人印象深刻:




    这种无监督的学习框架依赖于循环一致的生成对抗网络的新模型; 它由两个非对称函数组成:前向函数对基于示例的风格转换进行编码,而后向函数则删除风格。 下面是它的工作原理:




    by 老赵 2
    09

    The picture shows two coupled networks designed to implement these functions: one that transfers makeup style (G) and another that can remove makeup (F); the idea is to make the output of their successive application to an input photo match the input.

    Let us talk about losses again because they define the approach and capture the main new ideas in this work as well. The only notation we need for that is that X is the “no makeup” domain and Y is the domain of images with makeup. Now:

    • the discriminator DY tries to discriminate between real samples from domain Y (with makeup) and generated samples, and the generator Gaims to fool it; so here we use an adversarial loss to constrain the results of G to look similar to makeup faces from domain Y;

    • the same loss function is used for F for the same reason: to encourage it to generate images indistinguishable from no-makeup faces sampled from domain X;

    • but these loss functions are not enough; they would simply let the generator reproduce the same picture as the reference without any constraints imposed by the source; to prevent this, we use the identity loss for the composition of G and F: if we apply makeup to a face x from X and then immediately remove it, we should get back the input image x exactly;

    • now we have made the output of G to belong to Y (faces with makeup) and preserve the identity, but we still are not really using the reference makeup style in any way; to transfer the style, we use two different style losses:

    • style reconstruction loss Ls says that if we transfer makeup from a face y to a face x with G(x,y), then remove makeup from y with F(y), and then apply the style from G(x,y) back to F(y), we should get y back, i.e., G(F(y), G(x,y)) should be similar to y;

    • and then on top of all this, we add another discriminator DS that decides whether a given pair of faces have the same makeup; its style discriminator loss LP is the final element of the objective function.

    图为两个耦合网络,旨在实现这些功能:一个传递化妆风格(G),另一个可以消除化妆(F); 我们的想法是使其连续应用程序的输出与输入照片匹配。

    让我们再次讨论损失,因为他们定义了方法并捕捉了这项工作中的主要新想法。 我们需要的唯一符号是 X 是“无化妆”域,Y 是化妆图像的域。 现在:

    • 鉴别器 DY 试图区分来自域 Y(带化妆)的实际样本和生成的样本,并且生成器 G 旨在欺骗它; 所以在这里我们使用对抗性损失将 G 的结果限制为类似于域 Y 的化妆面部;

    • 由于同样的原因,F 使用相同的损失函数:鼓励它生成与从域 X 采样的无化妆面部无法区分的图像;

    • 但这些损失函数还不够; 他们只是简单地让发生器重现与参考相同的图像,而不受源的任何限制; 为了防止这种情况,我们使用 G 和 F 组合的同一性损失:如果我们从 X 对面部 x 施加化妆然后立即将其移除,我们应该准确地取回输入图像 x ;

    • 现在我们已经使 G 的输出属于 Y(面部化妆)并保留了身份,但我们仍然没有以任何方式使用参考化妆风格; 转移风格,我们使用两种不同的风格损失:

    • 风格重建损失 Ls 表示如果我们用 G(x,y)将化妆从脸部 y 转移到脸部 x,然后用 F(y)从y移除化妆,然后从 G(x,y)应用样式到F(y),我们应该回到 y,即 G(F(y),G(x,y))应该与 y 相似;

    • 然后在这一切之上,我们添加另一个鉴别器 DS,它决定一对给定的面部具有相同的构成; 它的风格鉴别器损失 LP 是目标函数的最终元素。

    by 老赵 2
    10

    There is more to the paper than just loss functions. For example, another problem was how to acquire a dataset of photos for the training set. The authors found an interesting solution: use beauty-bloggers from YouTube! They collected a dataset from makeup tutorial videos (verified manually on Amazon Mechanical Turk), thus ensuring that it would contain a large variety of makeup styles in high resolution.

    The results are, again, pretty impressive:



    论文不仅仅是损失函数。 例如,另一个问题是如何获取训练集的照片数据集。 作者找到了一个有趣的解决方案:使用来自YouTube的美女博主。 他们从化妆教程视频中收集了一个数据集(在亚马逊机械土耳其人手动验证),从而确保它包含高分辨率的各种化妆风格。

    结果再次令人印象深刻:





    by 老赵 2
    11

    The results become especially impressive if you compare them with previous state of the art models for makeup transfer:



    We have a feeling that the next Prisma might very well be lurking somewhere nearby…

    如果你与艺术模特化妆转移之前的状态对它们进行比较:结果会特别令人印象深刻:




    我们有一种感觉,下一个Prisma很可能潜伏在附近的某个地方…

    by 老赵 2
    12

    Facial Expression Recognition by De-expression Residue Learning

    With the last paper for today (full pdf), we turn from makeup to a different kind of very specific facial features: emotions. How can we disentangle identity and emotions?

    In this work, the proposed architecture contains two learning processes: the first is learning to generate standard neutral faces by conditional GANs (cGAN), and the second is learning from the intermediate layers of the resulting generator. To train the cGANs, we use pairs of face images that show some expression (input), and a neutral face image of the same subject (output):



    The cGAN is learned as usual: the generator reconstructs the output based on the input image, and then tuples (input, target, yes) and (inputoutput, no) are given to the discriminator. The discriminator tries to distinguish generated samples from the ground truth while the generator tries to not only confuse the discriminator but also generate an image as close to the target image as possible (composite loss functions again, but this time relatively simple).

    The paper calls this process de-expression (removing expression from a face), and the idea is that during de-expression, information related to the actual emotions is still recorded as an expressive component in the intermediate layers of the generator. Thus, for the second learning process we fix the parameters of the generator, and the outputs of intermediate layers are combined and used as input for deep models that do facial expression classification. The overall architecture looks like this:



    去表达残留学习的面部表情识别

    随着今天的最后一篇论文(完整pdf),我们从化妆转向另一种非常特殊的面部特征:情绪。 我们怎样才能解开身份和情感?

    在这项工作中,提出的架构包含两个学习过程:第一个是学习通过条件GAN(cGAN)生成标准中性面部,第二个是从生成的生成器的中间层学习。 为了训练 cGAN,我们使用显示一些表情(输入)的面部图像对和相同主题的中性面部图像(输出):




    像往常一样学习 cGAN:生成器基于输入图像重建输出,然后将元组(输入,目标,是)和(输入,输出,否)给予鉴别器。 鉴别器试图区分生成的样本和背景实况,而生成器不仅试图混淆鉴别器而且还生成尽可能接近目标图像的图像(复合损失函数再次,但这次相对简单)。

    本文将此过程称为去表达(从脸部去除表达),并且其思想是在去表达期间,与实际情绪相关的信息仍被记录为发生器的中间层中的表达组件。 因此,对于第二学习过程,我们固定生成器的参数,并且中间层的输出被组合并用作进行面部表情分类的深度模型的输入。 整体架构如下所示:






    by 老赵 2
    13

    After neutral face generation, the expression information can be analyzed by comparing the neutral face and the query expression face at the pixel level or feature level. However, pixel-level difference is unreliable due to the variation between images (i.e., rotation, translation, or lighting). This can cause a large pixel-level difference even without any changes in the expression. The feature-level difference is also unstable, as the expression information may vary according to the identity information. Since the difference between the query image and the neutral image is recorded in the intermediate layers, the authors exploit the expressive component from the intermediate layers directly.

    The following figure illustrates some samples of the de-expression residue, which are the expressive components for anger, disgust, fear, happiness, sadness, and surprise respectively; the pictures shows the corresponding histogram for each expressive component. As we can see, both expressive components and corresponding histograms are quite distinguishable:



    And here are some sample results on different datasets. In all pictures, the first column is the input image, the third column is the ground-truth neutral face image of the same subject, and the middle is the output of the generative model:



    As a result, the authors both get a nice network for de-expression, i.e., removing emotion from a face, and improve state of the art results for emotion recognition by training the emotion classifier on rich features captured by the de-expression network.

    在生成中性面部之后,可以通过在像素级别或特征级别比较中性面部和查询表达面部来分析表达信息。然而,由于图像之间的变化(即,旋转,平移或照明),像素级差异是不可靠的。即使表达式没有任何变化,这也会导致较大的像素级差异。特征级差异也是不稳定的,因为表达信息可能根据身份信息而变化。由于查询图像和中性图像之间的差异被记录在中间层中,因此作者直接利用来自中间层的表达成分。

    下图说明了去表达残基的一些样本,它们分别是愤怒,厌恶,恐惧,快乐,悲伤和惊讶的表达成分;图片显示了每个表达组件的相应直方图。我们可以看到,表达组件和相应的直方图都是可以区分的:




    以下是不同数据集的一些示例结果。 在所有图片中,第一列是输入图像,第三列是同一面部的背景真实中性面部图像,中间是生成模型的输出:




    结果,作者都获得了用于去表达的良好网络,即,从脸部移除情绪,并且通过在由去表达网络捕获的丰富特征上训练情绪分类器来改善用于情感识别的现有技术结果。



    by 老赵 2
    14

    Final words

    Thank you for reading! With this, we are finally done with CVPR 2018. It is hard to do justice to a conference this large; naturally, there were hundreds of very interesting papers that we have not been able to cover. But still, we hope it has been an interesting and useful selection. We will see you again soon in the next NeuroNugget installments. Good luck!

    Sergey Nikolenko
    Chief Research Officer, Neuromation

    Anastasia Gaydashenko
    former Research Intern at Neuromation, currently Machine Learning Intern at Cisco


    结语

    谢谢你的阅读。 有了这个,我们终于完成了2018年的CVPR。很难对这么大的会议做出正确的判断; 当然,有数百篇非常有趣的论文是我们无法涵盖的。 但是,我们仍然希望它是一个有趣和有用的选择。 我们很快会在下一个NeuroNugget分期介绍中再次见到你。好运。

    Sergey Nikolenko
    Chief Research Officer, Neuromation

    Anastasia Gaydashenko
    former Research Intern at Neuromation, currently Machine Learning Intern at Cisco

    by 老赵 2

    猜你喜欢

    转载自blog.csdn.net/sinat_33487968/article/details/84371328