FaceNet: A Unified Embedding for Face Recognition and Clustering论文解读

这篇paper提出了一个统一的系统，通过CNN来学习图片的欧氏嵌入，在嵌入空间的欧氏距离可以直接和相似度correspond。

这里写图片描述

和其他方法不同，FaceNet使用triplet loss直接训练输出128-D的embedding，triplets组成是两个matching的face 缩略图和一个non-matching的缩略图，缩略图是对人脸区域的直接crop，没有2D和3D对齐，而不是缩放和平移。

对于挑选triplet是一件很困难的事，文章提出了在线online筛选negative的策略来训练网络，也explore了hard-positive mining techniques。

文章探索了两种网络，第一种是ZF-net，包含多种交错的卷积、非线性激活层，还额外的增加了几个1*1*d的卷积层。第二种是基于Inception架构，使用混合层来并行的运行不同的卷积和池化操作，再连接它们的输出。

Given模型，最重要的部分是端到端的系统。为此，利用triplet loss来直接学习reflects。从一张图像x，到一个特征空间R^d,去strive这样的f(x)，保证在所有的face之间，一个人的欧氏距离很小，不同人之间的欧氏距离很大。

这里写图片描述

Triplets Loss

这里写图片描述

注意，生成所有可能的triplets会导致triplet loss很容易满足，这些triplets对训练没作用而且会导致收敛变慢，因为它们仍然需要传过Net。所以，对于选择hard triplets是至关重要的，它们对于提升model很有作用，下面说下triplets咋选。

Triplets Selection

这里写图片描述

在整个dataset上去计算argmin和argmax是不可行的，可能导致 poor training, as mislabelled and poorly imaged faces would dominate the hard positives and negatives. There are two obvious choices that avoid this issue:

每n步去离线生成triplets，使用最近的checkpoints，然后计算数据subset的argmax和argmin;
online生成Triplets，通过在一个mini-batch中选择hard positive/negative的exemplars。

在这里focus on online 的筛选，在mini-batch中计算argmax和argmin。
在实验中,每个mini-batch中的每个图片(anchor) around 40个同类的faces，另外，negative样例随机加入到mini-batch中。

这里没用hardest-positive, we use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. We don’t have a side-by-side comparison of hard anchor-positive pairs versus all anchor-positive pairs within a mini-batch, but we found in practice that the all anchor- positive method was more stable and converged slightly faster at the beginning of training.

文字也说了离线生成triplets的策略，可能允许使用小点的batch size，但没做实验。

每次选择最hard的negative会导致过早滴陷入局部最优，所以选择的手段是semi-hard.
这里写图片描述

这篇文字重点就这些，后续就是用Inception和ZF-Net的训练过程和结果，实验结果等。