Article directory
Summary
One of the main challenges of using Deep Convolutional Neural Networks
feature learning for large-scale face recognition is to design an appropriate loss function to enhance the discriminative ability. Centre loss
Intra-class compactness is achieved by penalizing the Euclidean distance between deep features and their corresponding class centers. SphereFace
It is assumed that the linear transformation matrix in the last fully-connected layer can be used as a representation of the class center in the angle space, and multiplicatively penalizes the angle between the deep features and their corresponding weights. Recently, a popular research direction is to add to the well-established loss function margin
to maximize the separability of face categories. In this paper, we propose Additive Angular Margin Loss
( ArcFace
) to obtain high-resolution features for face recognition. The proposed ArcFace
has a clear geometric interpretation due to the exact correspondence with the geodesic distance on the hypersphere. We perform the most extensive experimental evaluation face recognition benchmarks
of any SOTA
face recognition method on more than 10, including a new large-scale image database with trillion pairs and a large-scale video dataset. The authors show that ArcFace
consistently outperforms SOTA
and is easily achieved with negligible computational overhead.
Introduction
使用Deep Convolutional Neural Network
嵌入的人脸表示是人脸识别方案之一。典型地,在姿态标准化处理之后,DCNNs
将人脸图像映射成具有小的类内距和大的类间距特征。训练用于人脸识别的DCNNs
主要有两条研究路线。那些训练多分类的分类器可以分离训练集中的不同身份,例如通过使用softmax分类器,以及那些直接学习嵌入的分类器,如triplet loss
。基于大规模训练数据和精心设计的DCNN结构,基于softmax loss
和triplet loss
的方法都可以在人脸识别上获得优异的性能。然而,softmax loss
和triplet loss
都有一些缺点。
对于softmax loss
:
(1)线性变换矩阵的尺寸 W ∈ R d × n W∈{\mathbb{R}^{d \times n}} W∈Rd×n随 n n n线性增加;
(2)对于闭集分类问题,学习的特征是可分离的,但对于开集人脸识别问题,学习的特征并没有足够的区分度。
对于triplet loss
:
(1)face triplets
的数量存在组合爆炸,特别是对于大规模数据集,这导致迭代步骤数量显著增加;
(2)semi-hard
样本挖掘对于有效的模型训练是一个相当困难的问题。
已经提出了几种变体来增强softmax loss
的鉴别能力。Wen等人首创了centre loss
,即每个特征向量与其类中心之间的欧氏距离,以获得类内紧性,而类间离散性由softmax loss
的联合惩罚来保证。然而,在训练过程中更新实际的centres
是极其困难的,因为最近可供训练的人脸类别的数量急剧增加。
By observing that the weights of the last fully-connected layer of a classification DCNN trained on a softmax loss have conceptual similarities to the centers of each face category, some work in the literature proposes a penalty to simultaneously enforce additional intra- multiplicative angular margin
class Compactness and inter-class differences, so that the trained model has better discriminative ability. Although Sphereface
introducing angular margin
important concepts, their loss functions require a series of approximations to compute, which will lead to unstable training of the network. To stabilize training, they propose a hybrid loss function, which includes the standard one softmax loss
. Empirically, the softmax loss dominates during training because the integer-based multiplicative angular margin makes the target logit curve very steep, hindering convergence. CosFace
Adding the penalty directly to the target logit cosine margin
achieves better performance compared to SphereFace but allows for easier implementation and alleviates the need for joint supervision in the softmax loss.
In this paper, we propose a Additive Angular Margin Loss
( ArcFace
) to further improve the recognition ability of the face recognition model and stabilize the training process. As shown in the figure below, the dot product between the DCNN features and the last fully connected layer is equal to the cosine distance after the normalization of the features and weights. We use arc-cosine
to calculate the angle between the current feature and the target weight. We then add one to the target angle additive angular margin
and get the target logit again through the cosine function. We then rescale all the logic by a fixed feature norm, and the subsequent steps are exactly the same as in the softmax loss. ArcFace
The advantages can be summarized as follows:
Engaging: ArcFace
Direct optimization of geodesic distance margins by precise correspondence between angles and arcs in a normalized hypersphere. We visualize what happens in the 512-D space by analyzing angle statistics between features and weights.
Effective: ArcFace is achieved on ten face recognition benchmarks including large-scale image and video datasets SOTA
.
Easy: ArcFace only needs a few lines of code given in Algorithm 1, and is very easy to implement in a computational graph-based deep learning framework. Furthermore, ArcFace does not need to be combined with other loss functions to have stable performance and can easily converge on any training dataset.
Efficient: ArcFace adds negligible computational complexity during training. Current GPUs can easily support millions of identities for training, and model parallel strategies can easily support more identities.
Proposed Approach
ArcFace
The most widely used classification loss function softmax loss is as follows:
L 1 = − 1 N ∑ i = 1 N log e W y i T x i + b y i ∑ j = 1 n e W j T x i + b j {L_1} = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ { {e^{W_{ {y_i}}^T{x_i} + {b_{ {y_i}}}}}}}{ {\sum\nolimits_{j = 1}^n { {e^{W_j^T{x_i} + {b_j}}}} }}} L1=−N1i=1∑Nlog∑j=1neWjTxi+bjeWyiTxi+byi
where xi ∈ R d x_i∈{\mathbb{R}^d}xi∈Rd indicates that it belongs to theyi y_iyicategory iiThe depth features of i samples, the embedded feature dimension d is set to512 512512, W j ∈ R d W_j∈{\mathbb{R}^d} Wj∈Rd represents the weightW ∈ R d × n W ∈ {\mathbb{R}^{d×n}}W∈RThe jjthof d × nj列, b j ∈ R n b_j∈{\mathbb{R}^n} bj∈Rn is the bias item,NNN stands forbatchsize
,nnn represents the number of categories. Traditional onessoftmax
are widely used in deep face recognition. However, the softmax loss function does not explicitly optimize feature embeddings to enforce higher similarity of intra-class samples and diversity of inter-class samples, which results in large intra-class appearance variation (such as pose variation and age gap) and large-scale The performance gap of deep face recognition under test scenarios (e.g. million pairs or trillion pairs).
For simplicity, fix bj = 0 b_j=0bj=0,使 W y i T x i = ∥ W j ∥ ∥ x i ∥ cos θ j {W_{
{y_i}}^T{x_i}}=\left\| {
{W_j}} \right\|\left\| {
{x_i}} \right\|\cos {\theta _j} WyiTxi=∥Wj∥∥xi∥cosij, where θ j \theta _jijis the weight W j W_jWjwith features xi x_ixi. Use l 2 l_2l2Regularization, fixed ∥ W j ∥ = 1 \left\| {
{W_j}} \right\|=1∥Wj∥=1, ∥ x i ∥ = s \left\| {
{x_i}} \right\|=s ∥xi∥=s。在特征以及权重上的正则化步骤使得预测仅依赖于特征和权重之间的角度。因此,所学习的嵌入特征分布在半径为s的超球面上。
L 2 = − 1 N ∑ i = 1 N log e s cos θ y i e s cos θ y i + ∑ j = 1 , j ≠ y i n e s cos θ y i {L_2} = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{
{
{e^{s\cos {\theta _{yi}}}}}}{
{
{e^{s\cos {\theta _{yi}}}} + \sum\nolimits_{j = 1,j \ne {y_i}}^n {
{e^{s\cos {\theta _{yi}}}}} }}} L2=−N1i=1∑Nlogescosθyi+∑j=1,j=yinescosθyiescosθyi
由于嵌入特征分布在超球面上的每个特征中心周围,我们在 W y i W_{y_i} Wyisum xi x_ixiadditive angular margin
A penalty mm is added betweenm to enhance both intra-class closeness and inter-class differences. Since the proposedadditive angular margin
penalty is equal to the penalty in the normalized hyperspheregeodesic distance margin
, the proposed method is namedArcFace
.
L 3 = − 1 N ∑ i = 1 N log es ( cos ( θ yi + m ) ) es ( cos ( θ yi + m ) ) + ∑ j = 1 , j ≠ yines cos θ yi {L_3 } = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ {
{
e^{s(\cos ({\theta _{
{y_i}}} + m ))}}}}{
{
{e^{s(\cos ({\theta _{
{y_i}}} + m))}} + \sum\nolimits_{j = 1,j \ne {y_i}} ^n {
{e^{s\cos {\theta _{
{y_i}}}}}}} }}}L3=−N1i=1∑Nloges ( cos ( θyi+m))+∑j = 1 , j=yinescosiyies ( cos ( θyi+m))
We select face images from 8 different identities containing enough samples (about 1500 images/class) to train softmax
2D ArcFace
feature embedding networks using soft and respectively. As shown in PM, softmax
providing coarse separable feature embeddings creates significant ambiguity in the decision boundary, while the proposed one ArcFace
can clearly form a more pronounced gap between the closest classes.
Comparison of SphereFace and CosFace
Numerical Similarity: SphereFace
, ArcFace
and CosFace
, three different margin penalties ( margin penalty
), such as multiplicative angular margin
m 1 m_1m1、additive angular margin
m 2 m_2 m2and additive cosine margin
m 3 m_3m3. From a numerical analysis point of view, different margin penalties, whether increasing angle or cosine space, strengthen intra-class compactness and inter-class diversity by penalizing the target logit. As shown in the figure below, we plotted SphereFace
, , ArcFace
and CosFace
target logistic curves at optimal margin settings. We are only at [20°, 100°] [20°, 100°]These target logistic curves are shown within [ 2 0 ° , 100 ° ] because during trainingArcFace
, W yi W_{y_i}Wyigive xi x_ixiAngle between approximately 90° to 90°90 ° (random initialization) starting, and at approximately 30 ° 30°30 ° end . Intuitively, there are three factors in the target logit curve that affect performance, namely the start point, end point, and slope.
By combining all margin penalties
, we achieve SphereFace
, , ArcFace
and CosFace
, in a unified framework, where m 1 m_1m1、 m 2 m_2 m2and m 3 m_3m3Determine
L 4 = − 1 N ∑ i = 1 N log es ( cos ( m 1 θ yi + m 2 ) − m 3 ) es ( cos ( m 1 θ yi + m 2 ) − m ) + ∑ j = 1 , j ≠ yines cos θ j {L_4} = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ { {e^
{
s (\cos({m_1}{\theta_{
{y_i}}}+{m_2}) - {m_3})}}}}{
{
{e^{s(\cos({m_1}{\theta_{
{y_i}}} + {m_2}) - {m_3})}} + \sum\nolimits_{j = 1.j \ne {y_i}}^n {{e^{s\cos{\theta_j}
} }} }}}L4=−N1i=1∑Nloges(cos(m1iyi+m2)−m3)+∑j = 1 , j=yinescosijes(cos(m1iyi+m2)−m3)
As shown in Figure (b) above, by combining all the above margins
( cos ( m 1 θ + m 2 ) − m 3 ) (cos(m_1θ+m_2)-m_3)(cos(m1i+m2)−m3) , we can easily get some other target logit curves with very high performance.
Geometric Difference: Despite ArcFace
numerical similarities to previous work, the proposed one additive angular margin
has better geometric properties due to additive angular margin
the precise correspondence with geodesic distances. As shown in the figure below, we compare the decision boundary for the binary classification case. The proposed ArcFace
has constant over the whole interval linear angular margin
. In contrast, SphereFace
and CosFace
only one nonlinear angular margin
.
margin
Small differences in design can have a "butterfly effect" on model training. For example, the original SphereFace employs an annealing optimization strategy. To avoid divergence at the beginning of training, joint supervision with softmax is used in SphereFace to weaken multiplicative margin
the penalty. By applying the arccosine function, instead of using the complicated two-angle formula, we implement a margin
SphereFace that does not require integers on . In our implementation we found m = 1.35 m=1.35m=1. 3 5 can obtain similar performance to the originalSphereFace
without any convergence difficulty.
Comparison with other loss functions
Other loss functions can be designed based on the angular representation of the feature and weight vectors. For example, we can design a loss to strengthen intra-class compactness and inter-class differences on hyperspheres. As shown in the figure below, we compared the other three losses.
Intra-Loss: Aims to improve intra-class closeness by reducing the angle/arc between samples and ground truth centers.
L 5 = L 2 + 1 π N ∑ i = 1 N θ yi {L_5} = {L_2} + \frac{1}{ {\pi
N}}\sum\limits_{i = 1}^N { {
\ theta_{
{y_i}}}}L5=L2+π N1i=1∑Niyi
Inter-Loss: The goal is to enhance inter-class differences by increasing the angle/arc between different centers.
L 6 = L 2 − 1 π N ( n − 1 ) ∑ i = 1 N ∑ j = 1 , j ≠ yin arccos ( W yi TW j ) {L_6} = {L_2} - \frac{1}{
{ \pi N(n - 1)}}\sum\limits_{i = 1}^N {\sum\limits_{j = 1,j \ne {y_i}}^n {\arccos (W_{ {y_i}
} ^T{W_j})} }L6=L2−π N ( n−1)1i=1∑Nj = 1 , j=yi∑narccos(WyiTWj)
Here Inter-Loss
is a special case of the Minimum Hyper-spherical Energy
( MHE
) method. In this particular case, both hidden and output layers are MHE
regularized. In MHE
the paper, an example of a special loss function is also presented, which combines SphereFace
the loss of the sum of the losses in the last layer of the network.MHE
Triplet-loss: Aims to expand the angle/radian margin between the three samples. In FaceNet
, Euclidean margin
is applied to the normalized features. Here, we use triplet-loss
the angular representation arccos ( xiposxi ) + m ⩽ arccos ( xinegxi ) \arccos (x_i^{pos}{x_i}) + m \leqslant \arccos (x_i^{neg}{x_i} )arccos(xiposxi)+m⩽arccos(xinegxi)
experiment
Implementation Details
Datasets:
As shown in the table below, we take CASIA
, , VGGFace2
, MS1MV2
and DeepGlint-Face
(including MS1M-DeepGlint
and Asian-DeepGlint
) as our training data, respectively, in order to have a fair comparison with other methods. Note that what is presented MS1MV2
is one semi-automatic refinedversion
. hard samples
To the best of our knowledge, we are the first to use an ethnic-specific annotator for large-scale face image annotation, because edge cases (such as and ) are difficult to distinguish if the annotator is not familiar with the identity, during training, we explore noisy samples
effective face verification datasets (e.g. LFW
, CFP-FP
, AgeDB-30
) to examine the improvement in different settings. In addition to the most widely used LFW
and YTF
datasets, we also report performance on ArcFace
recent large-scale pose and large-range age datasets such as CPLFW
and . We also extensively test the proposed method on CALFW
large-scale image datasets (e.g. megafface
, IJB-B
, IJB-C, and trillion pairs) and video datasets ( ) .iQIYI-VID
ArcFace
Experimental Settings:
For data preprocessing, we utilize five facial points
to generate normalized face crop regions. For the embedding network, we adopt the widely used CNN architecture, ResNet50
and ResNet100
. In the last convolutional layer, we explore BN-dropped-FC-BN
the structure to obtain the final 512-D
embedded features. In this paper, we use ([training set, network structure, loss]) to facilitate understanding of the experimental setup.
We set the feature scale sss set to64 646 4 , and selectArcFace
the angular marginmmm is0.5 0.50.5 . _ _ TakeMXNet
the framework implementation, whichbatchsize
is 512, andNVIDIA Tesla P40
train the model on four (24GB) GPUs. OnCASIA
, the learning rate ranges from0.1 to 0.10.1 starts at 20K20K20K , 28K28K _Divide by 10 for 2 8 K iterations. The training process is at32K 32K3 2 K iterations to complete. OnMS1MV2
, we are at100K 100K100K、 160 K 160K Divide the learning rate at 1 6 0 K iterations, at180 K 180KEnded at 180K iterations . We willmomentum
set it to 0.9,weight decay
for5e − 4 5e-45e _−4 . During testing, we only keep feature embedding networks without fully connected layers (ResNet50
for160MB1 6 0 MB , forResNet100
250 MB 250MB2 5 0 M B ), and extract 512D 512Dfor each normalized face5 1 2D features (for ResNet508.9 ms/face
,ResNet100
for )15.4 ms/face
. To obtain the embedded features of a template (such asIJB-B
andIJB-C
) or a video (such asYTF
and iQIYI-VID
), we simply compute the feature centers for all images from the template or all frames from the video. Note that for rigorous evaluation, overlapping identities between train and test sets are removed, and we only use a single crop for all tests.
Ablation Study on Loss Function
As shown in the table below, we first use the angular margin setting explored ResNet50
on CASIA
the dataset . ArcFace
The best difference observed in our experiments was 0.5 0.50.5 . _ _ Using the framework we proposed above, it is easier to setSphereFace
andCosFace
, we find that when set to1.35 1.351 . 3 5 and0.35 0.350.35 , they have the best performance . OurSphereFace
andCosFace
leads to excellent performance without any difficulty in convergence. The proposed oneArcFace
achieves the highest validation accuracy on all three test sets. margin framework
Furthermore, we conducted extensive experimentswith combinationsCM1
(1 , 0.3 , 0.2 1, 0.3, 0.21 , 0.3 , 0.2 ) and ( 0.9, 0.4, 0.15 0.9 , 0.4CM2
, 0.150.9,0.4,0.15)观察到了一些最佳表现)。组合的margin framework
比单个的SphereFace和CosFace
的性能更好,但上限为ArcFace
的性能表现。
除了与基于margin
的方法进行比较之外,我们还对ArcFace
和其他旨在加强类内紧凑性和类间差异性的损失进行了进一步的比较。作为基线,我们选择了softmax
,并观察到CFP-FP
和AgeDB-30
在权重和特征后性能下降。通过将softmax
与类内损失相结合,性能在CFP-FP
和AgeDB-30
上有所提高。然而,将softmax
与类间损失相结合只会略微提高精度。事实上,Triplet-loss
优于Norm-Softmax
损失表明了margin
在提高性能方面的重要性。然而,在三元组样本中使用边距惩罚不如在样本和中心之间插入边距有效,如在弧面中。最后,我们将类内损失、类间损失和Triplet-loss
纳入ArcFace
,但没有观察到任何改进,这使我们相信ArcFace
已经在加强类内紧密性、类间差异和分类裕度。
为了更好地了解Arcface
的优势,我们在表3中的不同损失下提供了关于培训数据(CASIA
)和测试数据(LFW
)的详细角度统计数据。我们发现对于ArcFace
(1) W j W_j WjAlmost synchronous with the embedding feature center Arcface
( 14.2 9 ◦ 14.29^◦14.29◦ ), but forNorm-Softmax
, atW j W_jWjThere is a clear deviation between the centers of embedded features ( 44.26 ˚ 44.26 ˚4 4 . 2 6 ° ). Therefore,W j W_jWjThe angle between can not absolutely represent the between-class differences in the training data. Alternatively, the embedding feature centers computed by the trained network are more representative.
(2) The intra-class loss can effectively compress the intra-class variation, but also brings a small inter-class angle.
(3) Between-class loss can slightly increase WWW (direct) and embedding networks (indirect), but also improves the intra-class angle.
(4)ArcFace
Already have very good intra-class compactness and inter-class variance.
(5)Triplet-Loss
withArcFace
, has similar intra-class compactness, poorer inter-class variance. Also, on the test setArcFace
hasTriplet-Loss
a more pronounced margin than , as shown in the figure below.
evaluation result
Results on LFW, YTF, CALFW and CPLFW: LFW
and YTF
datasets are the most widely used benchmarks for unconstrained face verification on images and videos. As shown in the table below, MS1MV2
using Resnet100
training ArcFace
on A LFW
and A YTF
significantly margin
beats baseline
( Sphereface
A and B Cosface
), which shows that additive angular margin
the penalty can significantly improve the discriminative power of deep learning features, which demonstrates ArcFace
the effectiveness.
In addition to LFW
the and YTF
datasets, we also report performance on recently introduced datasets such as CPLFW
and , which show broader pose and age variation with the same identities. As shown in the table below, among all open source facial recognition models, the model is evaluated as the top face recognition model with a clear margin than its peers.CALFW
ArcFace
LFW
ArcFace
As shown in the figure below, we illustrate the angular distribution of positive and negative pairs ( predicted by the model trained on the dataset) on , LFW
P CFP-F
, AgeDB-30
, YTF
, CPLFW
and . We can clearly find that the intra-frame variance due to pose and age intervals significantly increases the angle between positive pairs, which increases the optimal threshold for face verification and produces more cluttered regions on the histogram.CALFW
MS1MV2
ResNet100
ArcFace
Results on MegaFace.
MegaFace DataSet
Including 1 M 1M1 M images containing690k 690k6 9 0 k unique individual acts530 530gallery set
from Facescrub100k 100k of 5 3 0 unique individuals100k photos asprobe set
. _ _ InMegaface
above, there are two test scenarios (recognition and verification) under two protocols (large or small training set). Define the training set if it contains images over 0.5m. For a fair comparison, we train ArcFace on CAISA and MS1MV2 under small and large protocols, respectively. In Table 6, Casia-trained ArcFace trains the best unimodal recognition and verification performance, not only surpassing strong baselines (e.g., Sphereface [18] and Cosface [37]), but also outperforming other published methods [38 ,17]. There are two test scenarios ( recognitionand verificationunder the twoprotocols (large/small training set. If the training set contains more than0.5M 0.5M0.5M images are defined as alarge dataset . For a fair comparison, we trainCAISA
andwith small and large protocols, respectively. In the table below,the above trainingachieves the best single pattern recognition and verification performance, not only surpassing the strong baselines (Sphereface and Cosface), but also outperforming other public methods.MS1MV2
ArcFace
CAISA
ArcFace
Since we observed a significant performance gap between recognition and verification, we MegaFace
performed a thorough manual inspection across the dataset and found many mislabeled face images, which would significantly impact test performance. Therefore, we manually improved the entire MegaFace
dataset and MegaFace
reported ArcFace
the correct performance on . After data cleaning MegaFace
, ArcFace
it still significantly outperforms CosFace
and achieves the best performance in terms of verification and identification.
Under the grand agreement, comparable results were obtained on recognition and better results on verification compared to , by ArcFace
explicit margin
transcendence . Due to the use of private training data, we retrain CosFace and Resnet100 on the dataset. In a fair comparison, the above shows superiority, and forms an overwhelming advantage in the identification and verification scenarios, as shown in the figure below.Faceget
CosFace
CosFace
MS1MV2
ArcFace
CosFace
Results on IJB-B and IJB-C: IJB-B
The dataset contains 1 , 845 1,8451,8 4 5 topics, a total of21.8K 21.8K21.8K stills and images from 7,011 7,0117,0 1 55K 55K of 1 video5 5K frames . There are 12 , 115 12,115in total12,1 1 5 templates with10,270 10,27010,2 7 0 real matches and8 M 8M8 M imposter matches. IJB-C
The dataset isIJB-B
another extension of , with3 , 531 3,5313,5 3 1 subjects with31.3k 31.3k3 1 . 3k still images and117.5k 117.5k1 1 7.5 k frames from 11,77911,77911,7 7 9 videos. In total there are23 , 124 23,12423,1 2 4 templates,19,557 19,55719,5 5 7 real matches and15,639K 15,639K15,6 3 9 K Imposter matches.
On IJB-B
and IJB-C
datasets, we use VGG2
the dataset as training data Reset50
to train as embedded networks ArcFace
for fair comparison with recent methods. In the table below, we will TAR the ArcFace TARTAR( @ F A R = 1 E − 4 @ FAR = 1E-4 @FAR=1 E−4 ) Comparison with previous state-of-the-art models. ArcFace
can obviously improveIJB-B
andIJB-C
(about3~5% 3~5%3 to 5 % performance, which is a significant reduction in error). From more training data (MS1MV2
) and a deeper neural network (Resnet100
),TARArcFace
canbe further improved onIJB-B
andIJBC
TAR( @ F A R = 1 E − 4 @ FAR = 1E-4 @FAR=1 E−4 ) Improvement to94.2% 94.2%9 4 . 2 % and95.6% 95.6%95.6 % . _ _ _ In the figure below, weshow the proposedfullIJB-B
and, even on, ArcFace can achieve impressive performance and set a new one.IJB-C
ArcFace
ROC
FAR= 1E-6
baseline
Results on Trillion-Pairs. Trillion-Pairs
The dataset provides 1.58MFlickr
of 1.58M from1.58M images as andgallery set
from5.7k 5.7k _274KLFW
274K of 5.7K identities _2 7 4K images asprobe set
. Every pair betweengallery set
andprobe set
In the table below, we compare the performance of training on different datasetsArcFace
. Compared withCASIA
, the proposedMS1MV2
dataset significantly improves the performance and even slightly outperformsDeepGlint-Face
the dataset with dual identities. 84.840 84.840%achievedwhen combiningMS1MV2
allDeepGlint
the identities of the Asian celebrityArcFace
84.840( @ F P R = 1 e − 3 @FPR=1e-3 @FPR=1 e−3 ) with the best recognition performance, and its verification performance is comparable tolead-board
the latest commit (CIGIT IRSEC
).
Results on iQIYI-VID.: The challenge contains 4934 4934
iQIYI-VID
from iQIYI variety shows, movies and TV series4 9 3 4 identities of565, 372 565,372565,3 7 2 video clips (training set219, 677 219,677219,6 7 7. Validation set172,860 172,860172,8 6 0 and test set172,835 172,835172,8 3 5 ). The length of each video ranges from1 11 to30 30It varies from 30 seconds . This dataset provides multimodal cues, including face, cloth, voice, gait, and subtitles, for character recognition. iQIYI-VID
The dataset usesMAP@100 MAP@100MAP @ 1 0 0 is used as the evaluation index. MAP MAPMAP (Mean Average Precision
)指的是总体平均准确率,是测试集中检索到的人物ID对应视频对训练集中每个人物ID(作为查询)的平均准确率的均值。
如下表所示,在MS1MV2
和Asian
数据集上使用ResNet100
训练的ArcFace设置了一个高baseline
( M A P = ( 79.80 MAP=(79.80%) MAP=(79.80))。基于每个训练视频的嵌入特征,我们训练了一个附加的三层全连通网络,该网络带有一个分类损失,以获得iQIYI-VID
数据集上的自定义特征描述符。MLP
在iQIYI-VID
训练集上的学习显著提高了 6.60 6.60% 6.60的平均成绩。借助模型集成的支持和现成的对象和场景分类器的上下文特征,我们的最终结果明显优于亚军( 0.99 % 0.99\% 0.99%))。
Conclusions
在本文中,我们提出了一个Additive Angular Margin
损失函数,对于人脸识别,可以有效增强通过DCNN
学习的特征嵌入的判别能力, 在文献报道的最全面的实验中,证明了我们的方法始终优于最先进的方法。
资源文件
ArcFace-Paddle-GitHub
ArcFace-MXNet-GitHub
ArcFace-Pytorch-GitHub
关于作者
姓名 | 郭权浩 |
---|---|
学校 | 电子科技大学研2020级 |
研究方向 | 计算机视觉 |
主页 | Deep Hao的主页 |
如有错误,请及时留言纠正,非常蟹蟹! | |
后续会有更多论文复现系列推出,欢迎大家有问题留言交流学习,共同进步成长! |