On the face recognition function of loss loss

On the face recognition function of loss loss

Disclaimer: This article is a blogger original article, follow the  CC 4.0 BY-SA  copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/u012505617/article/details/89355690

 

In face recognition, improve the algorithm is mainly reflected in the design of loss of function, loss function will optimize the entire network has the role of a guide. We have seen the loss of many commonly used functions from the traditional softmax loss to cosface, arcface which has improved to some extent, the article himself to organize under these algorithms.

Whether SphereFace, CosineFace or ArcFace loss function, are based on Softmax loss to be modified.

 

Base line Softmax loss
Various algorithms extending Triplet loss, center loss
The latest algorithm A-Softmax Loss(SphereFace),  Cosine Margin Loss, Angular Margin Loss, Arcface

1. Softmax loss

    \large L_1 = -\frac{1}{m}{\sum\limits_{i=1}^m}\log\left(\frac{e^{W^T_{y_i}x_i+b_{y_i}}}{ {\sum\limits_{j=1}^n}e^{W^T_jx_i+b_j} }\right)

This is softmax loss function, {W^T_{j}x_i+b_{j}}represents the output layer is fully connected. In calculating Loss decline, we let the {W^T_{j}x_i+b_{j}} specific gravity becomes large, so that the log () Number in brackets is increased more closer to 1, it will log (1) = 0, the entire loss will decrease.

This only takes into account whether the correct classification, but did not consider the distance between classes. So proposed center loss loss function. ( Paper )

 

2. Center loss

    \large L_C = -\frac{1}{2}{\sum\limits_{i=1}^m}{||x_i-c_{y_i}||}^2

    \large \Delta{c_j}=\frac{{\sum\limits_{i=1}^m}{\delta{(y_i=j)}\cdot{(c_j-x_i)}}}{1+{\sum\limits_{i=1}^m}{\delta{(y_i=j)}}}

center loss taking into account not only the classification to be right, but also requires a certain distance between classes. The above formula \large c_{y_i}represents the center of a certain category, \large x_irepresenting the feature value of each face. OF added on the basis of the softmax loss \large L_C, while using the parameter \large \lambdato control the distance class, the overall loss function as follows:

    \large L_2=L_S+L_C= -\frac{1}{m}{\sum\limits_{i=1}^m}\log\left(\frac{e^{W^T_{y_i}x_i+b_{y_i}}}{ {\sum\limits_{j=1}^n}e^{W^T_jx_i+b_j} }\right)+\frac{\lambda}{2}{\sum\limits_{i=1}^m}{||x_i-c_{y_i}||}^2

 

3. Triplet Loss

Loss function triplet, triplet of Anchor, Negative, Positive these three components. You can see from the chart, beginning Anchor relatively far away from the Positive, we want as much as possible close to the Anchor and Positive (same distance), Anchor and Negative as possible away (the distance between classes).

    \large L_3 = {\sum\limits_{i}^N}{\left [ ||f(x_i^a) - f(x_i^p)||^2_2 - ||f(x_i^a)-f(x_i^n)||_2^2 \right + \alpha ]}

Expressions of similar distance to the left, the right is the distance between the different classes. Use gradient descent optimization process is to make the inner class from declining, the distance between the rising class, so the loss of function in order to continue to shrink.

The above algorithms are relatively few old tradition, said the following about the relatively new algorithm.


 

4. L-softmax

Front Softmax loss function without considering the distance between classes, Center loss function can be compact in the class, but not between classes can be divided, and Triplet loss function is time-consuming, it creates a bit of new algorithms.

L-softmax function starts to do a more elaborate changes, from inside softmax function log \large e^{W^T_{y_i}x_i+b_{y_i}transformed into \large e^{||W_{yi}|| ||x_i||\psi{(\theta_{y_i})}}. Room L-softmax function not only want to pull a greater distance class, the class also capable of a more compact compression distance.

    \LARGE L_4 = \frac{1}{N}\sum_{i=1}^N L_i = \frac{1}{N}\sum_{i=1}^N -log(\frac{e^{f_y_i}}{\sum_{j}e^{f_i}}) 

    \LARGE L_i = -log(\frac{e^{||W_{yi}|| ||x_i||\psi{(\theta_{y_i})}}} {e^{||W_{yi}|| ||x_i||\psi{(\theta_{y_i})}} + \sum_{ j\neq y_i}{e^{||W_j|| ||x_i||cos(\theta_j)}}})

Wherein the changed cosθ cos (mθ),

    \large \psi(\theta) = \left\{\begin{matrix} \cos (m\theta ), 0\leqslant \theta \leqslant \frac{\pi }{m}& & \\ D(\theta), \frac{\pi}{m}\leqslant \theta \leqslant \pi & & \end{matrix}\right.

θ m times played increased margin effect, so that within-class distance is more compact, while the distance between the larger class. The larger the distance between the larger class m, monotonically decreasing as the cos function in (0, [pi]) interval, the larger the m cos (mθ) tends to zero.

 

5. SphereFace(A-Softmax)

A-softmax in the L-softmax function does a small modification, A-softmax add two constraints when considering margin: the weight W normalized  ||W|| = 1, b = 0. This makes the prediction model depends only on the angle between the W and X.

    \LARGE L_5 = -\frac{1}{N}\sum_{i=1}^{N}log( \frac{e^{||x_i||\cos(m\theta_{y_i})}} {e^{||x_i||\cos(m\theta_{y_i})} + \sum_{j \neq y_i}{e^{||x_i||cos(\theta_j)}}})

 

6. CosFace

cosface loss of function as follows:

    \LARGE L_6 = -\frac{1}{N} \sum_{i=1}^{N} log( \frac{e^{s(cos(\theta_{yi})-m)}}{e^{s(cos(\theta_{yi})-m)}+ \sum_{j=1, j\neq y_i}^k e^{scos \theta_j}})

In the above formula, s is the radius of the hypersphere, m is a margin.

 

7. ArcFace

Contrast arcface and cosface these two functions, find arcface maximize direct classification boundaries angle space, and cosface is to maximize the classification boundaries cosine space, such an amendment because angular distance from the more direct than the cosine of the angle of impact .  

\LARGE L_7= -\frac{1}{N} \sum_{i=1}^{N} log(\frac{e^{s(cos(\theta_{yi}+m))}}{e^{s(cos(\theta_{yi}+m))}+\sum_{j=1,j\neq y_i}^k e^{scos\theta_j}})

 

Decision boundary classification is as follows:

 arcface algorithm process is as follows:

 

Guess you like

Origin www.cnblogs.com/think90/p/11619347.html