Representation Learning and Identity Adversarial Training for Facial Behavior Understanding

15 Jul 2024  ยท  Mang Ning, Albert Ali Salah, Itir Onal Ertugrul ยท

Facial Action Unit (AU) detection has gained significant research attention as AUs contain complex expression information. In this paper, we unpack two fundamental factors in AU detection: data and subject identity regularization, respectively. Motivated by recent advances in foundation models, we highlight the importance of data and collect a diverse dataset Face9M, comprising 9 million facial images, from multiple public resources. Pretraining a masked autoencoder on Face9M yields strong performance in AU detection and facial expression tasks. We then show that subject identity in AU datasets provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions. To tackle this generic issue of AU tasks, we propose Identity Adversarial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features. Furthermore, we elucidate the design space of IAT and empirically show that IAT circumvents the identity shortcut learning and results in a better solution. Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective. Remarkably, the proposed FMAE-IAT approach achieves new state-of-the-art F1 scores on BP4D (67.1\%), BP4D+ (66.8\%), and DISFA (70.1\%) databases, significantly outperforming previous work. We release the code and model at https://github.com/forever208/FMAE-IAT, the first open-sourced facial model pretrained on 9 million diverse images.

PDF Abstract

Results from the Paper


 Ranked #1 on Facial Action Unit Detection on DISFA (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Facial Expression Recognition (FER) AffectNet FMAE Accuracy (8 emotion) 65.00 # 2
Facial Action Unit Detection BP4D FMAE Average F1 66.6 # 2
Facial Action Unit Detection BP4D FMAE-IAT Average F1 67.1 # 1
Facial Action Unit Detection BP4D+ FMAE Average F1 66.2 # 2
Facial Action Unit Detection BP4D+ FMAE-IAT Average F1 66.8 # 1
Facial Action Unit Detection DISFA FMAE-IAT Average F1 70.1 # 1
Facial Action Unit Detection DISFA FMAE Average F1 68.7 # 2
Facial Expression Recognition (FER) RAF-DB FMAE Overall Accuracy 93.09 # 2

Methods