Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

10 May 2019  ·  Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, Yu Qiao ·

Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem with three-fold contributions. First, to stimulate the research of FER under real-world occlusions and variant poses, we build several in-the-wild facial expression datasets with manual annotations for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Facial Expression Recognition (FER) AffectNet RAN (ResNet-18+) Accuracy (7 emotion) - # 24
Accuracy (8 emotion) 59.5 # 21
Facial Expression Recognition (FER) FERPlus RAN (VGG-16) Accuracy(pretrained) 89.16 # 2
Facial Expression Recognition (FER) RAF-DB RAN (ResNet-18) Overall Accuracy 86.9 # 19
Facial Expression Recognition (FER) SFEW RAN (VGG16+ResNet18) Accuracy 56.4 # 1