On the face recognition function of loss loss
In face recognition, improve the algorithm is mainly reflected in the design of loss of function, loss function will optimize the entire network has the role of a guide. We have seen the loss of many commonly used functions from the traditional softmax loss to cosface, arcface which has improved to some extent, the article himself to organize under these algorithms.
Whether SphereFace, CosineFace or ArcFace loss function, are based on Softmax loss to be modified.
Base line | Softmax loss |
Various algorithms extending | Triplet loss, center loss |
The latest algorithm | A-Softmax Loss(SphereFace), Cosine Margin Loss, Angular Margin Loss, Arcface |
1. Softmax loss
This is softmax loss function, represents the output layer is fully connected. In calculating Loss decline, we let the specific gravity becomes large, so that the log () Number in brackets is increased more closer to 1, it will log (1) = 0, the entire loss will decrease.
This only takes into account whether the correct classification, but did not consider the distance between classes. So proposed center loss loss function. ( Paper )
2. Center loss
center loss taking into account not only the classification to be right, but also requires a certain distance between classes. The above formula represents the center of a certain category, representing the feature value of each face. OF added on the basis of the softmax loss , while using the parameter to control the distance class, the overall loss function as follows:
3. Triplet Loss
Loss function triplet, triplet of Anchor, Negative, Positive these three components. You can see from the chart, beginning Anchor relatively far away from the Positive, we want as much as possible close to the Anchor and Positive (same distance), Anchor and Negative as possible away (the distance between classes).
Expressions of similar distance to the left, the right is the distance between the different classes. Use gradient descent optimization process is to make the inner class from declining, the distance between the rising class, so the loss of function in order to continue to shrink.
The above algorithms are relatively few old tradition, said the following about the relatively new algorithm.
4. L-softmax
Front Softmax loss function without considering the distance between classes, Center loss function can be compact in the class, but not between classes can be divided, and Triplet loss function is time-consuming, it creates a bit of new algorithms.
L-softmax function starts to do a more elaborate changes, from inside softmax function log transformed into . Room L-softmax function not only want to pull a greater distance class, the class also capable of a more compact compression distance.
Wherein the changed cosθ cos (mθ),
θ m times played increased margin effect, so that within-class distance is more compact, while the distance between the larger class. The larger the distance between the larger class m, monotonically decreasing as the cos function in (0, [pi]) interval, the larger the m cos (mθ) tends to zero.
5. SphereFace(A-Softmax)
A-softmax in the L-softmax function does a small modification, A-softmax add two constraints when considering margin: the weight W normalized , b = 0. This makes the prediction model depends only on the angle between the W and X.
6. CosFace
cosface loss of function as follows:
In the above formula, s is the radius of the hypersphere, m is a margin.
7. ArcFace
Contrast arcface and cosface these two functions, find arcface maximize direct classification boundaries angle space, and cosface is to maximize the classification boundaries cosine space, such an amendment because angular distance from the more direct than the cosine of the angle of impact .
Decision boundary classification is as follows:
arcface algorithm process is as follows: