[Baidu Thesis Recurrence Competition] ArcFace: Additive Angular Margin Loss for Deep Face Recognition

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Summary

One of the main challenges of using Deep Convolutional Neural Networksfeature learning for large-scale face recognition is to design an appropriate loss function to enhance the discriminative ability. Centre lossIntra-class compactness is achieved by penalizing the Euclidean distance between deep features and their corresponding class centers. SphereFaceIt is assumed that the linear transformation matrix in the last fully-connected layer can be used as a representation of the class center in the angle space, and multiplicatively penalizes the angle between the deep features and their corresponding weights. Recently, a popular research direction is to add to the well-established loss function marginto maximize the separability of face categories. In this paper, we propose Additive Angular Margin Loss( ArcFace) to obtain high-resolution features for face recognition. The proposed ArcFacehas a clear geometric interpretation due to the exact correspondence with the geodesic distance on the hypersphere. We perform the most extensive experimental evaluation face recognition benchmarksof any SOTAface recognition method on more than 10, including a new large-scale image database with trillion pairs and a large-scale video dataset. The authors show that ArcFaceconsistently outperforms SOTAand is easily achieved with negligible computational overhead.

Introduction

使用Deep Convolutional Neural Network嵌入的人脸表示是人脸识别方案之一。典型地,在姿态标准化处理之后,DCNNs将人脸图像映射成具有小的类内距和大的类间距特征。训练用于人脸识别的DCNNs主要有两条研究路线。那些训练多分类的分类器可以分离训练集中的不同身份,例如通过使用softmax分类器,以及那些直接学习嵌入的分类器,如triplet loss。基于大规模训练数据和精心设计的DCNN结构,基于softmax losstriplet loss的方法都可以在人脸识别上获得优异的性能。然而,softmax losstriplet loss都有一些缺点。

对于softmax loss
(1)线性变换矩阵的尺寸 W ∈ R d × n W∈{\mathbb{R}^{d \times n}} WRd×n n n n线性增加;
(2)对于闭集分类问题,学习的特征是可分离的,但对于开集人脸识别问题,学习的特征并没有足够的区分度。

对于triplet loss:
(1)face triplets的数量存在组合爆炸,特别是对于大规模数据集,这导致迭代步骤数量显著增加;
(2)semi-hard样本挖掘对于有效的模型训练是一个相当困难的问题。

已经提出了几种变体来增强softmax loss的鉴别能力。Wen等人首创了centre loss,即每个特征向量与其类中心之间的欧氏距离,以获得类内紧性,而类间离散性由softmax loss的联合惩罚来保证。然而,在训练过程中更新实际的centres是极其困难的,因为最近可供训练的人脸类别的数量急剧增加。

By observing that the weights of the last fully-connected layer of a classification DCNN trained on a softmax loss have conceptual similarities to the centers of each face category, some work in the literature proposes a penalty to simultaneously enforce additional intra- multiplicative angular marginclass Compactness and inter-class differences, so that the trained model has better discriminative ability. Although Spherefaceintroducing angular marginimportant concepts, their loss functions require a series of approximations to compute, which will lead to unstable training of the network. To stabilize training, they propose a hybrid loss function, which includes the standard one softmax loss. Empirically, the softmax loss dominates during training because the integer-based multiplicative angular margin makes the target logit curve very steep, hindering convergence. CosFaceAdding the penalty directly to the target logit cosine marginachieves better performance compared to SphereFace but allows for easier implementation and alleviates the need for joint supervision in the softmax loss.

In this paper, we propose a Additive Angular Margin Loss( ArcFace) to further improve the recognition ability of the face recognition model and stabilize the training process. As shown in the figure below, the dot product between the DCNN features and the last fully connected layer is equal to the cosine distance after the normalization of the features and weights. We use arc-cosineto calculate the angle between the current feature and the target weight. We then add one to the target angle additive angular marginand get the target logit again through the cosine function. We then rescale all the logic by a fixed feature norm, and the subsequent steps are exactly the same as in the softmax loss. ArcFaceThe advantages can be summarized as follows:

Engaging: ArcFaceDirect optimization of geodesic distance margins by precise correspondence between angles and arcs in a normalized hypersphere. We visualize what happens in the 512-D space by analyzing angle statistics between features and weights.

Effective: ArcFace is achieved on ten face recognition benchmarks including large-scale image and video datasets SOTA.

Easy: ArcFace only needs a few lines of code given in Algorithm 1, and is very easy to implement in a computational graph-based deep learning framework. Furthermore, ArcFace does not need to be combined with other loss functions to have stable performance and can easily converge on any training dataset.

Efficient: ArcFace adds negligible computational complexity during training. Current GPUs can easily support millions of identities for training, and model parallel strategies can easily support more identities.


Proposed Approach

ArcFace

The most widely used classification loss function softmax loss is as follows:

L 1 = − 1 N ∑ i = 1 N log ⁡ e W y i T x i + b y i ∑ j = 1 n e W j T x i + b j {L_1} = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ { {e^{W_{ {y_i}}^T{x_i} + {b_{ {y_i}}}}}}}{ {\sum\nolimits_{j = 1}^n { {e^{W_j^T{x_i} + {b_j}}}} }}} L1=N1i=1Nlogj=1neWjTxi+bjeWyiTxi+byi

where xi ∈ R d x_i∈{\mathbb{R}^d}xiRd indicates that it belongs to theyi y_iyicategory iiThe depth features of i samples, the embedded feature dimension d is set to512 512512 W j ∈ R d W_j∈{\mathbb{R}^d} WjRd represents the weightW ∈ R d × n W ∈ {\mathbb{R}^{d×n}}WRThe jjthof d × nj列, b j ∈ R n b_j∈{\mathbb{R}^n} bjRn is the bias item,NNN stands forbatchsize,nnn represents the number of categories. Traditional onessoftmaxare widely used in deep face recognition. However, the softmax loss function does not explicitly optimize feature embeddings to enforce higher similarity of intra-class samples and diversity of inter-class samples, which results in large intra-class appearance variation (such as pose variation and age gap) and large-scale The performance gap of deep face recognition under test scenarios (e.g. million pairs or trillion pairs).

For simplicity, fix bj = 0 b_j=0bj=0,使 W y i T x i = ∥ W j ∥ ∥ x i ∥ cos ⁡ θ j {W_{ {y_i}}^T{x_i}}=\left\| { {W_j}} \right\|\left\| { {x_i}} \right\|\cos {\theta _j} WyiTxi=Wjxicosij, where θ j \theta _jijis the weight W j W_jWjwith features xi x_ixi. Use l 2 l_2l2Regularization, fixed ∥ W j ∥ = 1 \left\| { {W_j}} \right\|=1Wj=1 ∥ x i ∥ = s \left\| { {x_i}} \right\|=s xi=s。在特征以及权重上的正则化步骤使得预测仅依赖于特征和权重之间的角度。因此,所学习的嵌入特征分布在半径为s的超球面上。
L 2 = − 1 N ∑ i = 1 N log ⁡ e s cos ⁡ θ y i e s cos ⁡ θ y i + ∑ j = 1 , j ≠ y i n e s cos ⁡ θ y i {L_2} = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ { {e^{s\cos {\theta _{yi}}}}}}{ { {e^{s\cos {\theta _{yi}}}} + \sum\nolimits_{j = 1,j \ne {y_i}}^n { {e^{s\cos {\theta _{yi}}}}} }}} L2=N1i=1Nlogescosθyi+j=1,j=yinescosθyiescosθyi

由于嵌入特征分布在超球面上的每个特征中心周围,我们在 W y i W_{y_i} Wyisum xi x_ixiadditive angular marginA penalty mm is added betweenm to enhance both intra-class closeness and inter-class differences. Since the proposedadditive angular marginpenalty is equal to the penalty in the normalized hyperspheregeodesic distance margin, the proposed method is namedArcFace.
L 3 = − 1 N ∑ i = 1 N log ⁡ es ( cos ⁡ ( θ yi + m ) ) es ( cos ⁡ ( θ yi + m ) ) + ∑ j = 1 , j ≠ yines cos ⁡ θ yi {L_3 } = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ { { e^{s(\cos ({\theta _{ {y_i}}} + m ))}}}}{ { {e^{s(\cos ({\theta _{ {y_i}}} + m))}} + \sum\nolimits_{j = 1,j \ne {y_i}} ^n { {e^{s\cos {\theta _{ {y_i}}}}}}} }}}L3=N1i=1Nloges ( cos ( θyi+m))+j = 1 , j=yinescosiyies ( cos ( θyi+m))

We select face images from 8 different identities containing enough samples (about 1500 images/class) to train softmax2D ArcFacefeature embedding networks using soft and respectively. As shown in PM, softmaxproviding coarse separable feature embeddings creates significant ambiguity in the decision boundary, while the proposed one ArcFacecan clearly form a more pronounced gap between the closest classes.

Toy examples under the softmax and ArcFace loss on 8 identities with 2D features. Dots indicate samples and lines refer to the centre direction of each identity. Based on the feature normalisation, all face features are pushed to the arc space with a fixed radius. The geodesic distance gap between closest classes becomes evident as the additive angular margin penalty is incorporated.

Comparison of SphereFace and CosFace

Numerical Similarity: SphereFace , ArcFaceand CosFace, three different margin penalties ( margin penalty), such as multiplicative angular margin m 1 m_1m1additive angular margin m 2 m_2 m2and additive cosine margin m 3 m_3m3. From a numerical analysis point of view, different margin penalties, whether increasing angle or cosine space, strengthen intra-class compactness and inter-class diversity by penalizing the target logit. As shown in the figure below, we plotted SphereFace, , ArcFaceand CosFacetarget logistic curves at optimal margin settings. We are only at [20°, 100°] [20°, 100°]These target logistic curves are shown within [ 2 0 ° , 100 ° ] because during trainingArcFace , W yi W_{y_i}Wyigive xi x_ixiAngle between approximately 90° to 90°90 ° (random initialization) starting, and at approximately 30 ° 30°30 ° end . Intuitively, there are three factors in the target logit curve that affect performance, namely the start point, end point, and slope.

Target logit analysis. (a) distributions from start to end during ArcFace training. (2) Target logit curves for softmax, SphereFace, ArcFace, CosFace and combined margin penalty.

By combining all margin penalties, we achieve SphereFace, , ArcFaceand CosFace, in a unified framework, where m 1 m_1m1 m 2 m_2 m2and m 3 m_3m3Determine
L 4 = − 1 N ∑ i = 1 N log ⁡ es ( cos ⁡ ( m 1 θ yi + m 2 ) − m 3 ) es ( cos ⁡ ( m 1 θ yi + m 2 ) − m ) + ∑ j = 1 , j ≠ yines cos ⁡ θ j {L_4} = - \frac{1}{N}\sum\limits_{i = 1}^N {\log \frac{ { {e^ { s (\cos({m_1}{\theta_{ {y_i}}}+{m_2}) - {m_3})}}}}{ { {e^{s(\cos({m_1}{\theta_{ {y_i}}} + {m_2}) - {m_3})}} + \sum\nolimits_{j = 1.j \ne {y_i}}^n {{e^{s\cos{\theta_j} } }} }}}L4=N1i=1Nloges(cos(m1iyi+m2)m3)+j = 1 , j=yinescosijes(cos(m1iyi+m2)m3)

As shown in Figure (b) above, by combining all the above margins( cos ( m 1 θ + m 2 ) − m 3 ) (cos(m_1θ+m_2)-m_3)(cos(m1i+m2)m3) , we can easily get some other target logit curves with very high performance.

Geometric Difference: Despite ArcFacenumerical similarities to previous work, the proposed one additive angular marginhas better geometric properties due to additive angular marginthe precise correspondence with geodesic distances. As shown in the figure below, we compare the decision boundary for the binary classification case. The proposed ArcFacehas constant over the whole interval linear angular margin. In contrast, SphereFaceand CosFaceonly one nonlinear angular margin.

Decision margins of different loss functions under bi- nary classification case. The dashed line represents the decision boundary, and the grey areas are the decision margins.

marginSmall differences in design can have a "butterfly effect" on model training. For example, the original SphereFace employs an annealing optimization strategy. To avoid divergence at the beginning of training, joint supervision with softmax is used in SphereFace to weaken multiplicative marginthe penalty. By applying the arccosine function, instead of using the complicated two-angle formula, we implement a marginSphereFace that does not require integers on . In our implementation we found m = 1.35 m=1.35m=1. 3 5 can obtain similar performance to the originalSphereFacewithout any convergence difficulty.


Comparison with other loss functions

Other loss functions can be designed based on the angular representation of the feature and weight vectors. For example, we can design a loss to strengthen intra-class compactness and inter-class differences on hyperspheres. As shown in the figure below, we compared the other three losses.

Based on the centre and feature normalisation, all identities are distributed on a hypersphere. To enhance intra-class compactness and inter-class discrepancy, we consider four kinds of Geodesic Distance (GDis) constraint. (A) Margin-Loss: insert a geodesic distance margin between the sample and centres. (B) Intra-Loss: decrease the geodesic distance between the sample and the corresponding centre. (C) Inter-Loss: increase the geodesic distance between different centres. (D) Triplet-Loss: insert a geodesic distance margin between triplet samples. In this paper, we propose an Additive Angular Margin Loss (ArcFace), which is exactly corresponded to the geodesic distance (Arc) margin penalty in (A), to enhance the discriminative power of face recognition model. Extensive experimental results show that the strategy of (A) is most effective.

Intra-Loss: Aims to improve intra-class closeness by reducing the angle/arc between samples and ground truth centers.
L 5 = L 2 + 1 π N ∑ i = 1 N θ yi {L_5} = {L_2} + \frac{1}{ {\pi N}}\sum\limits_{i = 1}^N { { \ theta_{ {y_i}}}}L5=L2+π N1i=1Niyi

Inter-Loss: The goal is to enhance inter-class differences by increasing the angle/arc between different centers.
L 6 = L 2 − 1 π N ( n − 1 ) ∑ i = 1 N ∑ j = 1 , j ≠ yin arccos ⁡ ( W yi TW j ) {L_6} = {L_2} - \frac{1}{ { \pi N(n - 1)}}\sum\limits_{i = 1}^N {\sum\limits_{j = 1,j \ne {y_i}}^n {\arccos (W_{ {y_i} } ^T{W_j})} }L6=L2π N ( n1)1i=1Nj = 1 , j=yinarccos(WyiTWj)

Here Inter-Lossis a special case of the Minimum Hyper-spherical Energy( MHE) method. In this particular case, both hidden and output layers are MHEregularized. In MHEthe paper, an example of a special loss function is also presented, which combines SphereFacethe loss of the sum of the losses in the last layer of the network.MHE

Triplet-loss: Aims to expand the angle/radian margin between the three samples. In FaceNet, Euclidean marginis applied to the normalized features. Here, we use triplet-lossthe angular representation arccos ⁡ ( xiposxi ) + m ⩽ arccos ⁡ ( xinegxi ) \arccos (x_i^{pos}{x_i}) + m \leqslant \arccos (x_i^{neg}{x_i} )arccos(xiposxi)+marccos(xinegxi)


experiment

Implementation Details

Datasets:
As shown in the table below, we take CASIA, , VGGFace2, MS1MV2and DeepGlint-Face(including MS1M-DeepGlintand Asian-DeepGlint) as our training data, respectively, in order to have a fair comparison with other methods. Note that what is presented MS1MV2is one semi-automatic refinedversion. hard samplesTo the best of our knowledge, we are the first to use an ethnic-specific annotator for large-scale face image annotation, because edge cases (such as and ) are difficult to distinguish if the annotator is not familiar with the identity, during training, we explore noisy sampleseffective face verification datasets (e.g. LFW, CFP-FP, AgeDB-30) to examine the improvement in different settings. In addition to the most widely used LFWand YTFdatasets, we also report performance on ArcFacerecent large-scale pose and large-range age datasets such as CPLFWand . We also extensively test the proposed method on CALFWlarge-scale image datasets (e.g. megafface, IJB-B, IJB-C, and trillion pairs) and video datasets ( ) .iQIYI-VIDArcFace

Face datasets for training and testing. “(P)” and “(G)” refer to the probe and gallery set, respectively.

Experimental Settings:

For data preprocessing, we utilize five facial pointsto generate normalized face crop regions. For the embedding network, we adopt the widely used CNN architecture, ResNet50and ResNet100. In the last convolutional layer, we explore BN-dropped-FC-BNthe structure to obtain the final 512-Dembedded features. In this paper, we use ([training set, network structure, loss]) to facilitate understanding of the experimental setup.

We set the feature scale sss set to64 646 4 , and selectArcFacethe angular marginmmm is0.5 0.50.5 . _ _ TakeMXNetthe framework implementation, whichbatchsizeis 512, andNVIDIA Tesla P40train the model on four (24GB) GPUs. OnCASIA, the learning rate ranges from0.1 to 0.10.1 starts at 20K20K20K 28K28K _Divide by 10 for 2 8 K iterations. The training process is at32K 32K3 2 K iterations to complete. OnMS1MV2, we are at100K 100K100K 160 K 160K Divide the learning rate at 1 6 0 K iterations, at180 K 180KEnded at 180K iterations . We willmomentumset it to 0.9,weight decayfor5e − 4 5e-45e _4 . During testing, we only keep feature embedding networks without fully connected layers (ResNet50for160MB1 6 0 MB , forResNet100250 MB 250MB2 5 0 M B ), and extract 512D 512Dfor each normalized face5 1 2D features (for ResNet508.9 ms/face,ResNet100for )15.4 ms/face . To obtain the embedded features of a template (such asIJB-BandIJB-C) or a video (such asYTFand iQIYI-VID), we simply compute the feature centers for all images from the template or all frames from the video. Note that for rigorous evaluation, overlapping identities between train and test sets are removed, and we only use a single crop for all tests.


Ablation Study on Loss Function

As shown in the table below, we first use the angular margin setting explored ResNet50on CASIAthe dataset . ArcFaceThe best difference observed in our experiments was 0.5 0.50.5 . _ _ Using the framework we proposed above, it is easier to setSphereFaceandCosFace, we find that when set to1.35 1.351 . 3 5 and0.35 0.350.35 , they have the best performance . OurSphereFaceandCosFaceleads to excellent performance without any difficulty in convergence. The proposed oneArcFaceachieves the highest validation accuracy on all three test sets. margin frameworkFurthermore, we conducted extensive experimentswith combinationsCM1(1 , 0.3 , 0.2 1, 0.3, 0.21 , 0.3 , 0.2 ) and ( 0.9, 0.4, 0.15 0.9 , 0.4CM2 , 0.150.90.40.15)观察到了一些最佳表现)。组合的margin framework比单个的SphereFace和CosFace的性能更好,但上限为ArcFace的性能表现。

除了与基于margin的方法进行比较之外,我们还对ArcFace和其他旨在加强类内紧凑性和类间差异性的损失进行了进一步的比较。作为基线,我们选择了softmax,并观察到CFP-FPAgeDB-30在权重和特征后性能下降。通过将softmax与类内损失相结合,性能在CFP-FPAgeDB-30上有所提高。然而,将softmax与类间损失相结合只会略微提高精度。事实上,Triplet-loss优于Norm-Softmax损失表明了margin在提高性能方面的重要性。然而,在三元组样本中使用边距惩罚不如在样本和中心之间插入边距有效,如在弧面中。最后,我们将类内损失、类间损失和Triplet-loss纳入ArcFace,但没有观察到任何改进,这使我们相信ArcFace已经在加强类内紧密性、类间差异和分类裕度。

为了更好地了解Arcface的优势,我们在表3中的不同损失下提供了关于培训数据(CASIA)和测试数据(LFW)的详细角度统计数据。我们发现对于ArcFace
(1) W j W_j WjAlmost synchronous with the embedding feature center Arcface( 14.2 9 ◦ 14.29^◦14.29 ), but forNorm-Softmax, atW j W_jWjThere is a clear deviation between the centers of embedded features ( 44.26 ˚ 44.26 ˚4 4 . 2 6 ° ). Therefore,W j W_jWjThe angle between can not absolutely represent the between-class differences in the training data. Alternatively, the embedding feature centers computed by the trained network are more representative.
(2) The intra-class loss can effectively compress the intra-class variation, but also brings a small inter-class angle.
(3) Between-class loss can slightly increase WWW (direct) and embedding networks (indirect), but also improves the intra-class angle.
(4)ArcFaceAlready have very good intra-class compactness and inter-class variance.
(5)Triplet-LosswithArcFace, has similar intra-class compactness, poorer inter-class variance. Also, on the test setArcFacehasTriplet-Lossa more pronounced margin than , as shown in the figure below.

Angle distributions of all positive pairs and random neg- ative pairs (∼ 0.5M) from LFW. Red area indicates positive pairs while blue indicates negative pairs. All angles are represented in degree. ([CASIA, ResNet50, loss*]).

evaluation result

Results on LFW, YTF, CALFW and CPLFW: LFW and YTFdatasets are the most widely used benchmarks for unconstrained face verification on images and videos. As shown in the table below, MS1MV2using Resnet100training ArcFaceon A LFWand A YTFsignificantly marginbeats baseline( SpherefaceA and B Cosface), which shows that additive angular marginthe penalty can significantly improve the discriminative power of deep learning features, which demonstrates ArcFacethe effectiveness.

V erification performance (%) of different methods on LFW and YTF.

In addition to LFWthe and YTFdatasets, we also report performance on recently introduced datasets such as CPLFWand , which show broader pose and age variation with the same identities. As shown in the table below, among all open source facial recognition models, the model is evaluated as the top face recognition model with a clear margin than its peers.CALFWArcFaceLFWArcFace

Verification performance (%) of open-sourced face recog- nition models on LFW, CALFW and CPLFW.

As shown in the figure below, we illustrate the angular distribution of positive and negative pairs ( predicted by the model trained on the dataset) on , LFWP CFP-F, AgeDB-30, YTF, CPLFWand . We can clearly find that the intra-frame variance due to pose and age intervals significantly increases the angle between positive pairs, which increases the optimal threshold for face verification and produces more cluttered regions on the histogram.CALFWMS1MV2ResNet100ArcFace

Angle distributions of both positive and negative pairs on LFW, CFP-FP , AgeDB-30, YTF, CPLFW and CALFW. Red area indicates positive pairs while blue indicates negative pairs. All angles are represented in degree. ([MS1MV2, ResNet100, ArcFace])

Results on MegaFace.
MegaFace DataSet Including 1 M 1M1 M images containing690k 690k6 9 0 k unique individual acts530 530gallery setfrom Facescrub100k 100k of 5 3 0 unique individuals100k photos asprobe set. _ _ InMegafaceabove, there are two test scenarios (recognition and verification) under two protocols (large or small training set). Define the training set if it contains images over 0.5m. For a fair comparison, we train ArcFace on CAISA and MS1MV2 under small and large protocols, respectively. In Table 6, Casia-trained ArcFace trains the best unimodal recognition and verification performance, not only surpassing strong baselines (e.g., Sphereface [18] and Cosface [37]), but also outperforming other published methods [38 ,17]. There are two test scenarios ( recognitionand verificationunder the twoprotocols (large/small training set. If the training set contains more than0.5M 0.5M0.5M images are defined as alarge dataset . For a fair comparison, we trainCAISAandwith small and large protocols, respectively. In the table below,the above trainingachieves the best single pattern recognition and verification performance, not only surpassing the strong baselines (Sphereface and Cosface), but also outperforming other public methods.MS1MV2ArcFaceCAISAArcFace

Face identification and verification evaluation of different methods on MegaFace Challenge1 using FaceScrub as the probe set. “Id” refers to the rank-1 face identification accuracy with 1M distractors, and “Ver” refers to the face verification TAR at 10−6 FAR. “R” refers to data refinement on both probe set and 1M distractors. ArcFace obtains state-of-the-art performance under both small and large protocols.

Since we observed a significant performance gap between recognition and verification, we MegaFaceperformed a thorough manual inspection across the dataset and found many mislabeled face images, which would significantly impact test performance. Therefore, we manually improved the entire MegaFacedataset and MegaFacereported ArcFacethe correct performance on . After data cleaning MegaFace, ArcFaceit still significantly outperforms CosFaceand achieves the best performance in terms of verification and identification.

Under the grand agreement, comparable results were obtained on recognition and better results on verification compared to , by ArcFaceexplicit margintranscendence . Due to the use of private training data, we retrain CosFace and Resnet100 on the dataset. In a fair comparison, the above shows superiority, and forms an overwhelming advantage in the identification and verification scenarios, as shown in the figure below.FacegetCosFaceCosFaceMS1MV2ArcFaceCosFace

CMC and ROC curves of different models on MegaFace. Results are evaluated on both original and refined MegaFace dataset.

Results on IJB-B and IJB-C: IJB-B The dataset contains 1 , 845 1,8451,8 4 5 topics, a total of21.8K 21.8K21.8K stills and images from 7,011 7,0117,0 1 55K 55K of 1 video5 5K frames . There are 12 , 115 12,115in total12,1 1 5 templates with10,270 10,27010,2 7 0 real matches and8 M 8M8 M imposter matches. IJB-CThe dataset isIJB-Banother extension of , with3 , 531 3,5313,5 3 1 subjects with31.3k 31.3k3 1 . 3k still images and117.5k 117.5k1 1 7.5 k frames from 11,77911,77911,7 7 9 videos. In total there are23 , 124 23,12423,1 2 4 templates,19,557 19,55719,5 5 7 real matches and15,639K 15,639K15,6 3 9 K Imposter matches.

On IJB-Band IJB-Cdatasets, we use VGG2the dataset as training data Reset50to train as embedded networks ArcFacefor fair comparison with recent methods. In the table below, we will TAR the ArcFace TARTAR @ F A R = 1 E − 4 @ FAR = 1E-4 @FAR=1 E4 ) Comparison with previous state-of-the-art models. ArcFacecan obviously improveIJB-BandIJB-C(about3~5% 3~5%3 to 5 % performance, which is a significant reduction in error). From more training data (MS1MV2) and a deeper neural network (Resnet100),TARArcFacecanbe further improved onIJB-BandIJBCTAR @ F A R = 1 E − 4 @ FAR = 1E-4 @FAR=1 E4 ) Improvement to94.2% 94.2%9 4 . 2 % and95.6% 95.6%95.6 % . _ _ _ In the figure below, weshow the proposedfullIJB-Band, even on, ArcFace can achieve impressive performance and set a new one.IJB-CArcFaceROCFAR= 1E-6baseline

ROC curves of 1:1 verification protocol on the IJB-B and IJB-C dataset.

Results on Trillion-Pairs. Trillion-Pairs The dataset provides 1.58MFlickr of 1.58M from1.58M images as andgallery set from5.7k 5.7k _274KLFW274K of 5.7K identities _2 7 4K images asprobe set. Every pair betweengallery setandprobe setIn the table below, we compare the performance of training on different datasetsArcFace. Compared withCASIA, the proposedMS1MV2dataset significantly improves the performance and even slightly outperformsDeepGlint-Facethe dataset with dual identities. 84.840 84.840%achievedwhen combiningMS1MV2allDeepGlintthe identities of the Asian celebrityArcFace84.840 @ F P R = 1 e − 3 @FPR=1e-3 @FPR=1 e3 ) with the best recognition performance, and its verification performance is comparable tolead-boardthe latest commit (CIGIT IRSEC).

Identification and verification results (%) on the Trillion-Pairs dataset. ([Dataset*, ResNet100, ArcFace])

Results on iQIYI-VID.: The challenge contains 4934 4934
iQIYI-VID from iQIYI variety shows, movies and TV series4 9 3 4 identities of565, 372 565,372565,3 7 2 video clips (training set219, 677 219,677219,6 7 7. Validation set172,860 172,860172,8 6 0 and test set172,835 172,835172,8 3 5 ). The length of each video ranges from1 11 to30 30It varies from 30 seconds . This dataset provides multimodal cues, including face, cloth, voice, gait, and subtitles, for character recognition. iQIYI-VIDThe dataset usesMAP@100 MAP@100MAP @ 1 0 0 is used as the evaluation index. MAP MAPMAPMean Average Precision)指的是总体平均准确率,是测试集中检索到的人物ID对应视频对训练集中每个人物ID(作为查询)的平均准确率的均值。

如下表所示,在MS1MV2Asian数据集上使用ResNet100训练的ArcFace设置了一个高baseline M A P = ( 79.80 MAP=(79.80%) MAP=(79.80))。基于每个训练视频的嵌入特征,我们训练了一个附加的三层全连通网络,该网络带有一个分类损失,以获得iQIYI-VID数据集上的自定义特征描述符。MLPiQIYI-VID训练集上的学习显著提高了 6.60 6.60% 6.60的平均成绩。借助模型集成的支持和现成的对象和场景分类器的上下文特征,我们的最终结果明显优于亚军( 0.99 % 0.99\% 0.99%))。

MAP of our method on the iQIYI-VID test set. “MLP” refers to a three-layer fully connected network trained on the iQIYI-VID training data.

Conclusions

在本文中,我们提出了一个Additive Angular Margin损失函数,对于人脸识别,可以有效增强通过DCNN学习的特征嵌入的判别能力, 在文献报道的最全面的实验中,证明了我们的方法始终优于最先进的方法。


资源文件

ArcFace-Paddle-GitHub
ArcFace-MXNet-GitHub
ArcFace-Pytorch-GitHub


关于作者

姓名 郭权浩
学校 电子科技大学研2020级
研究方向 计算机视觉
主页 Deep Hao的主页
如有错误,请及时留言纠正,非常蟹蟹!
后续会有更多论文复现系列推出,欢迎大家有问题留言交流学习,共同进步成长!

Guess you like

Origin blog.csdn.net/qq_39567427/article/details/117849435