Deeplearning.ai──人脸识别

一、什么是人脸识别？

人脸验证 VS. 人脸识别

Verification：判断face是否为对应人（1:1对应）
Recognition：输出face对应人的ID（1:K对应）

　　在人脸验证中，给定两张图片，告诉是否是同一人，最简单的方法就是对两张图片的像素逐个比较，如果他们的距离小于某一阈值，就可以认定为同一人。当然，这种算法表现非常差，因为像素值随着人脸的方向，光照不同甚至头部微小的变化都会产生巨大的差异。在后面我们不会用到比较像素值，而是学习一种对图像的编码f(img)，以便按元素对这种编码进行比较，从而更准确地判断两幅图片是否属于同一人。

图1. 人脸验证简单思路：逐个像素对比
　　人脸识别问题对于人脸验证问题来说，具有更高的难度。如对于一个验证系统来说，如果我们拥有 99% 的精确度，那么这个验证系统已经具有了很高的精度；但是假设在另外一个识别系统中，如果我们把这个验证系统应用在具有K个人的识别系统中，那么系统犯错误的机会就变成了 K 倍。所以如果我们想在识别系统中得到更高的精度，那么就需要得到一个具有更高精度的验证系统。

二、对人脸图像编码为128维向量

2.1 利用ConNet编码

　　FaceNet模型需要大量的数据和很长的时间来进行训练。因此，按照应用深度学习环境中的常见做法，让我们加载预训练的权重。网络体系结构遵循Szegedy等人的初始模型。我们提供了一个初始网络实现。
　　通过使用含128个神经元的全连接层作为最后一层，模型保证了输出为128维的编码向量，我们可以使用这样的编码向量来比较两张人脸图像：
　　

图2. 通过计算两个编码向量的距离，然后和阈值比较，就可以确定这两张人脸图像是否为同一个人
因此，如果能遵循以下规则，那么我们就说这个编码是好的：

同一个人的两张人脸图像编码后差异很小
不同人的两张人脸图像编码差异非常大

Triplet Loss(三元组损失)，其中的三元也就是如下图的Anchor、Positive、Negative，

图3. 从左到右依次叫做Anchor (A), Positive (P), Negative (N)

2.2 Triplet Loss (三元组损失)

　　对于一张图像x，我们定义其编码为f(x), 这里的f是通过神经网络计算的函数。
这里写图片描述

图５

将使用图像(A, P, N)来训练

A 代表”Anchor”：一个人的图像
P 代表”Positive”：同一个人的另一张图像
N 代表”Negative”：不同人的一张图像

在训练集中使用 $(A^{(i)}, P^{(i)}, N^{(i)})$ 来代表 $i$ -th 训练样本, 你想要确定一张图片 $A^{(i)}$ 和 $P^{(i)}$ 接近程度比 $A^{(i)}$ 和 $N^{(i)}$ 多至少一个间隔 $\alpha$ 。

∣ ∣ f (A^{(i)}) - f (P^{(i)}) ∣ ∣_{2}^{2} + α <∣∣ f (A^{(i)}) - f (N^{(i)}) ∣ ∣_{2}^{2}

$\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$

因此我们想要最小化下面的”triplet cost“：

\begin{matrix} (3) & J = \sum_{i = 1}^{N} [\underset{(1)}{\underset{⏟}{∣ ∣ f (A^{(i)}) - f (P^{(i)}) ∣ ∣_{2}^{2}}} - \underset{(2)}{\underset{⏟}{∣ ∣ f (A^{(i)}) - f (N^{(i)}) ∣ ∣_{2}^{2}}} + α]_{+} \end{matrix}

$\mathcal{J} = \sum^{N}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3}$

这里我们使用 “ $[z]_+$ ” 来代表 $max(z,0)$ .

# GRADED FUNCTION: triplet_loss

def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss as defined by formula (3)

    Arguments:
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)

    Returns:
    loss -- real number, value of the loss
    """

    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
   c 
    ### START CODE HERE ### (≈ 4 lines)
    # Step 1: Compute the (encoding) distance between the anchor and the positive
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)))
    # Step 2: Compute the (encoding) distance between the anchor and the negative
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)))
    # Step 3: subtract the two previous distances and add alpha.
    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(tf.maximum(basic_loss, 0))
    #loss = pos_dist
    ### END CODE HERE ###

    return loss