基于tensorflow的人脸识别技术(facenet)的测试

人脸识别的应用非常广泛，而且进展特别快。如LFW的评测结果上已经都有快接近99.9%的。

Uni-Ubi⁶⁰	0.9900 ± 0.0032
FaceNet⁶²	0.9963 ± 0.0009
Baidu⁶⁴	0.9977 ± 0.0006
AuthenMetric⁶⁵	0.9977 ± 0.0009
MMDFR⁶⁷	0.9902 ± 0.0019
CW-DNA-1⁷⁰	0.9950 ± 0.0022
Faceall⁷¹	0.9967 ± 0.0007
JustMeTalk⁷²	0.9887 ± 0.0016
Facevisa⁷⁴	0.9955 ± 0.0014
pose+shape+expression augmentation⁷⁵	0.9807 ± 0.0060
ColorReco⁷⁶	0.9940 ± 0.0022
Asaphus⁷⁷	0.9815 ± 0.0039
Daream⁷⁸	0.9968 ± 0.0009
Dahua-FaceImage⁸⁰	0.9978 ± 0.0007
Easen Electron⁸¹	0.9978 ± 0.0006
Skytop Gaia⁸²	0.9630 ± 0.0023
CNN-3DMM estimation⁸³	0.9235 ± 0.0129
Samtech Facequest⁸⁴	0.9971 ± 0.0018
XYZ Robot⁸⁷	0.9895 ± 0.0020
THU CV-AI Lab⁸⁸	0.9973 ± 0.0008
dlib⁹⁰	0.9938 ± 0.0027
Aureus⁹¹	0.9920 ± 0.0030
YouTu Lab, Tencent⁶³	0.9980 ± 0.0023
Orion Star⁹²	0.9965 ± 0.0032
Yuntu WiseSight⁹³	0.9943 ± 0.0045
PingAn AI Lab⁸⁹	0.9980 ± 0.0016
Turing123⁹⁴	0.9940 ± 0.0040
Hisign⁹⁵	0.9968 ± 0.0030
VisionLabs V2.0³⁸	0.9978 ± 0.0007
Deepmark⁹⁶	0.9923 ± 0.0016
Force Infosystems⁹⁷	0.9973 ± 0.0028
ReadSense⁹⁸	0.9982 ± 0.0007

在上述模型中，有许多是商业公司的排名，所以呢，基本上很少有开源的东西。此处只对谷歌的facenet进行测试。

FaceNet的架构如下所示：

从上面可以看出，没有使用softmax层，而直接利用L2层正则化输出，获取其图像表示，即特征抽象层。而深度学习的框架可以使用现有的成熟模型，如tensorflow slim中的每一种模型。

而最后一个Triplet Loss则是采用了三元组的损失函数。其代码如下所示

def triplet_loss(anchor, positive, negative, alpha):
    """Calculate the triplet loss according to the FaceNet paper
    
    Args:
      anchor: the embeddings for the anchor images.
      positive: the embeddings for the positive images.
      negative: the embeddings for the negative images.
  
    Returns:
      the triplet loss according to the FaceNet paper as a float tensor.
    """
    with tf.variable_scope('triplet_loss'):
        pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
        neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
        
        basic_loss = tf.add(tf.subtract(pos_dist,neg_dist), alpha)
        loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
      
    return loss

从上面代码可以看出，三元组其实就是三个样例，如(anchor, pos, neg)，利用距离关系来判断。即在尽可能多的三元组中，使得anchor和pos正例的距离，小于anchor和neg负例的距离。

其学习优化如下图所示：

测试：(代码见：https://github.com/davidsandberg/facenet)

由于facenet无需限制人脸对齐，但是代码中提供了MTCNN的对齐，而且在LFW评分中也发现经过对齐的分数能够提高一个档次。

利用提供的代码，在LFW上进行EVAL，发现其精度高达99.2%

当然，还有更高的。

另外，程序中还提供了进行两张图片距离的比较的代码，进行调试，结果如下：

基于tensorflow的人脸识别技术(facenet)的测试

猜你喜欢