【CVPR 2023】Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition——CCF A Visual Authority Conference

Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition
Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition
(International top results in the Computable Emotion Series)

Abstract
Dynamic Facial Expression Recognition (DFER) is a rapidly developing field that focuses on recognizing facial expressions in video format. Previous research has considered non-target frames as noisy frames, but we propose that it should be treated as a weakly supervised problem. We also identify the imbalance of short- and long-term temporal relationships in DFER. Therefore, we introduce the Multi-3D Dynamic Facial Expression Learning (M3DFEL) framework, which utilizes Multi-Instance Learning (MIL) to handle inexact labels. M3DFEL generates 3D-instances to model the strong short-term temporal relationship and utilizes 3DCNNs for feature extraction. The Dynamic Long-term Instance Aggregation Module (DLIAM) is then utilized to learn the long-term temporal relationships and dynamically aggregate the instances. Our experiments on DFEW and FERV39K datasets show that M3DFEL outperforms existing state-of-the-art approaches with a vanilla R3D18 backbone. The source code is available at https://github.com/faceeyes/M3DFEL.

Chinese Abstract
Dynamic Facial Expression Recognition (DFER) is a rapidly growing field focused on recognizing facial expressions in video formats. Previous studies treat non-target frames as noise frames, but we propose that it should be treated as a weakly supervised problem. We also find an imbalance in the short-term and long-term time relationships in DFER. Therefore, we introduce the Multi-3D Dynamic Facial Expression Learning (M3DFEL) framework, which utilizes multi-instance learning (MIL) to handle inaccurate labels. M3DFEL generates 3D instances to model strong short-term temporal relationships and leverages 3DCNNs for feature extraction. The Dynamic Long-Term Instance Aggregation Module (DLIAM) is then utilized to learn long-term temporal relationships and dynamically aggregate instances. Our experiments on DFEW and FERV39K datasets show that M3DFEL outperforms existing state-of-the-art methods with vanilla R3D18 as the backbone. Source code is available at https://github.com/faceeyes/M3DFEL.

View original text

Guess you like

Origin blog.csdn.net/lsttoy/article/details/130952685