Pose-Robust Face Recognition via Deep Residual Equivariant Mapping

通过深度残差等变映射实现抵抗姿态的人脸识别

2018CVPR 香港中文大学、Sense Time
作者github项目地址：DREAM

摘要

原文	译文
Face recognition achieves exceptional success thanks to the emergence of deep learning. However, many contemporary face recognition models still perform relatively poor in processing profile faces compared to frontal faces.	由于深度学习的发展，人脸识别算法的到了巨大的提升和应用，但是现有的人脸识别系统在侧脸上表现远远不如正脸那么好。
A key reason is that the number of frontal and profile training faces are highly imbalanced - there are extensively more frontal training samples compared to profile ones. In addition, it is intrinsically hard to learn a deep representation that is geometrically invariant to large pose variations.	主要的一个原因是，数据集数据的不平衡，数据集中正脸的数目超过了测量数据，此外，学习和姿态有关的特征表示很困难。
In this study, we hypothesize that there is an inherent mapping between frontal and proﬁle faces, and consequently, their discrepancy in the deep representation space can be bridged by an equivariant mapping. To exploit this mapping, we formulate a novel Deep Residual EquivAriant Mapping (DREAM) block, which is capable of adaptively adding residuals to the input deep representation to transform a proﬁle face representation to a canonical pose that simpliﬁes recognition.	本文中，假设存在一种正脸和侧脸之间的变换，那么在特征空间内的变换可以通过一个equivariant映射实现。为了实现这种映射关系，本文提出了一个用残差网络实现equivaiant映射的block，这个block可以将测量特征变换到正脸特征，这样有利于识别。
The DREAM block consistently enhances the performance of proﬁle face recognition for many strong deep networks, including ResNet models, without deliberately augmenting training data of proﬁle faces. The block is easy to use, light-weight, and can be implemented with a negligible computational overhead.	DREAM block可以在不需要可以做数据增强情况下提高深度卷积网络的识别精度，例如ResNet。

main idea

作者首先介绍了feature equivariance的概念，然后给出如何借助feature quivariance设计DREAM模块，最后给出了使用DREAM的方法。

feature equivariance

Feature equivariance首先是在2015CVPR论文《 Understanding image representa-tions by measuring their equivariance and equivalence》提出，介绍了当输入图片变换是，representation特征也会变化，并且这些变换可以通过数据学习。
用数据公式来描述就是，对于输入 $x$ ，经过一个CNN后可以得到特征 $\phi(x)/$ ；存在一个变换 $g$ ，输入变成 $gx$ ,那么特征可以描述为 $M_{g} \phi(x)$ 。2015的这边文章作者关注的是几何变换，但是本文的作者针对人脸识别中姿态的问题，那么这个 $g$ 是三维坐标中从侧脸profile到正脸frontal的变换。

Problem Formulation and the DREAM Block

人脸识别算法中，一般做法是将人脸图片映射到特征空间，所以作者想的是提取一种和pose无关的特征，即从特征中剔除pose的影响。
受上节介绍的等变映射启发，假设同一个人的正脸特征是 $x_{f}$ ，侧脸特征是 $x_{p}$
那么存在一个映射变换 $M_{g}$ ，将侧脸profile变成正脸frontal
$\phi(x_{f})\approx M_{g}\phi(x_{p})$
为了更好的将 $M_{g}$ 和CNN结合，将 $M_{g}\phi(x_{p})$ 用残差的方式表示
$M_{g}\phi(x_{p})=\phi(x_{p})+Y(x_{p})R(\phi(x_{p}))$
上式中， $Y(x) \in[0,1]$ ，表示测量的程度， $Y(x) =0$ 表示正脸。之所以需要 $Y(x)$ 这项量化残差连接的数量，是因为加入不根据侧脸的程度加入一样的残差连接，会影响到人脸识别的精度。
作者希望，通过加入 $Y(x_{p})R(\phi(x_{p}))$ 这项后，能够将特征映射到和frontal聚集的区域。

Architecture and Training

block
系统的结构如上图所示。
stem CNN 选用ResNet-18或者50，用来提取图片的特征，为了使得ResNet适应分类任务，在原来的最后一个maxpool和全连接层加入一个256维的全连接层，用作特征层。
Residual 残差分支，用来生成残差表示 $R(\phi(x))$ ，用PReLU作为激活函数。这层参数的学习用frotal图和profile图之间的欧式距离和SGD学习，用MS-Celeb-1M数据集的frontal-profile pairs训练。
Head Rotation Estimator $Y(x)$ 估计器，输入人脸的21个关键点，估计pose invariant的程度，使用的算法来自于《Appearance-based gaze estimation in the wild》cvpr2015

最终，将三部分的输入按照公式输入，得到最终的特征。

the usages of DREAM

三种使用DREAM的方式

Stitching：最简单的方法，直接将DREAM模块和已经训练好的stem网络连接，不需要改变原网络的参数，只需要将DREAM和stem网络的最后一个全连接层连接就行。
End2End：用end2end的方法将DREAM模块和stem网络一起训练。
End2End+retrain：在end2end的基础上，再用frontal-proflie人脸对，对DREAM模块单独进行训练。

experiments

作者在CFP和IJB-A两个数据集上进行测试

Evaluation on CFP with Frontal-ProfileSetting

cfp result

Evaluation on IJB-A with Full Pose Variation

JIB-A

人脸识别-Pose(1):DREAM:Pose-Robust Face Recognition via Deep Residual Equivariant Mapping