CVPR2023 living body detection Instance-Aware Domain Generalization for Face Anti-Spoofing study notes

Paper link : https://arxiv.org/pdf/2304.05640.pdf

Code link : GitHub - qianyuzqy/IADG: (CVPR 2023) Instance-Aware Domain Generalization for Face Anti-Spoofing (not yet published)

research motivation

  1. Previous liveness detection methods based on domain generalization (DG) usually rely on domain labels to align the distribution of each domain in order to learn domain-invariant feature representations. However, human-labeled domain labels are coarse-grained and subjective, and they cannot accurately reflect the real domain distribution;
  2. Domain-aware based methods focus on domain-level alignment, and these methods are not fine-grained enough to ensure that the learned feature representation is domain-insensitive (with domain desensitization).

research innovation

In order to solve the above shortcomings, the author proposes a DG-based liveness detection method (Instance-Aware Domain Generalization, IADG for short) from a novel perspective (Instance-Aware), which explores domain-insensitive features and fine-grained Aligning these features at the instance level (this method does not need to use domain labels), improves the generalization of the model in unseen scenes. Specifically, the author first introduces Asymmetric Instance Adaptive Whitening (AIAW), which improves the generalization ability of features by adaptively whitening the domain-sensitive feature correlation in each instance. Unlike directly learning domain-agnostic features, AIAW aims to weaken the feature correlation from high-order statistics at the fine-grained instance level. In addition, the author also proposed Dynamic Kernel Generator (DKG) and Categorical Style Assembly (CSA), these two modules help AIAW learn and domain insensitive features.

related work

The works related to FAS will not be introduced. This piece mainly records Feature Covariance and Instance Whitening

"Texture synthesis using convolutional neural networks" and "Image style transfer using convolutional neural networks" point out that feature correlation (such as covariance matrix) stores the domain-specific styles of the image. Whitening transformation, such as "Image-to-image translation via group-wise deep whitening-and-coloring transformation", "Universal style transfer via feature transforms", "Switchable whitening for deep representation learning", can remove features The correlation and allow each feature to have unit variance. Based on the above theoretical basis, a large number of studies have proved that whitening can effectively remove domain-specific styles in the fields of image translation, style conversion, domain adaptation, and semantic segmentation. Therefore, instance whitening can improve the generalization ability of features, but it has not been fully explored in DG FAS. Inspired by these works, and considering the asymmetry between real and fake faces, the authors propose the AIAW method to improve the generalization ability of the FAS model.

methodology

The above picture shows the overall framework of IADG, which aligns features at the instance level by weakening the sensitivity of features to instance-specific styles. The core modules of IADG are: DKG , CSA , AIAW . These three modules will be introduced separately below.

DKG

Considering the differences between samples from different source domains, it is difficult to extract instance-adaptive features through static filters . Therefore, the author designed DKG to automatically generate instance-adaptive filters , which helps the instance-static filter to learn comprehensive instance-adaptive features.

DKG consists of a static convolution branch and a dynamic kernel branch, the first branch has fixed parameters, and the parameters of the second branch depend on each instance. Models are represented as static or dynamic, depending on whether the model parameters change with each sample. As shown in the figure below, X^{i}and F^{i}represent the input and output features of the i-th sample in the DKG module. During the training phase, both branches are optimized. Specifically, the authors first X^{i}split into two parts along the channel dimension, denoted as \hat{X}^{i}and respectively \tilde{X}^{i}. On the static convolution branch, \tilde{X}^{i}it will be sent to a static convolution. On the dynamic kernel branch, \hat{X}^{i}it will be sent to the global average pooling layer and convolution module to generate an instance-adaptive kernel (dynamic kernel), which will W^{i}then W^{i}be used to extract \hat{X}^{i}specific features (equivalent to multiplying the input mask). The calculation formula of the output of the two branches is as follows:

The output calculation formula of the DKG module is as follows:

CSA

This module is used to generate samples of different styles. Specifically, the author uses the farthest point sampling (FPS) technique to iteratively select L styles for each category from all samples, so the base style obtained by FPS can represent the entire style space to the greatest extent. The base style is dynamically updated every epoch. For each category of base styles, calculate their mean and variance, \mu _{base}^{r}and \mu _{base}^{s}represent the mean of base styles for real people and prosthetics, respectively.

Considering that recombining the content of real faces with fake face styles may affect the liveness features of real faces, the authors treat them differently in feature enhancement. Style augmentation is performed only when content features and style features have the same class label. For each category c, the authors B([\alpha _{1},...,\alpha _{L}])extract combined weights from the Dirichlet distribution W^{c}=[w_{1},...,w_{L}], calculated as follows:

For each instance F_{org}'s content feature, F_{org}a new base style with the same category will be used for style assembly, avoiding label changes and making stylized samples more realistic.

AIAW

To align each sample at a finer granularity, we consider the correlation between feature channels as an explicit constraint for instance-adaptive generalization. Since whitening has been shown to be effective in removing specific styles, this may improve the generalization ability of DG-FAS features. However, directly applying these instance whitenings simultaneously removes domain-invariant features that are discriminative for FAS classification, leading to suboptimal performance. Therefore, the authors design a novel instance whitening loss for FAS. This loss aims to selectively suppress the covariance of sensitivity while enhancing the covariance of insensitivity. Specifically, considering the background of the FAS task, the author introduces the idea of ​​real-fake face asymmetry in instance whitening (real face features should be more compact, while fake face features can be separated in feature space). Therefore, during the whitening process, different selection ratios are used to suppress the sensitivity covariance of real and fake faces.

AIAW calculation method:

1. Input the feature map of the sample to the IN layer to obtain the corresponding standardized feature F;

2. The covariance matrix of feature F is calculated using the following formula:

3. Deduce the selective mask for the covariance matrix, the calculation formula is as follows:

4. Use selective mask for AIAW, the calculation formula of instance whitening loss is as follows:

Overall training and optimization

In order to show strong generalization in unseen domains, a binary classification loss is used to supervise the feature F_{org}^{i}sum F_{aug}^{i}to ensure that the feature extractor extracts task-related features. The formula is as follows:

In addition, the author uses the depth estimator Dep to estimate the depth map for the real face (the depth map of the prosthesis is all 0 feature maps). The loss calculation formula of this part is as follows:

The overall training loss is as follows:

During the training phase, the original branch and the enhanced branch are optimized simultaneously. During the testing phase, only the original branch is used for inference.

Experimental results

From the results, IADG has good results under different test protocols, and the ablation experiment proves that each module can improve the performance of the model. Look forward to the author's open source code.

Guess you like

Origin blog.csdn.net/qq_38964360/article/details/130264982