Learning Deep Global Multi-scale and Local Attention Features for Facial Expression Recognition in the Wild

Facial expression recognition (FER) in the wild received broad concerns in which occlusion and pose variation are two key issues. This paper proposed a global multi-scale and local attention network (MA-Net) for FER in the wild. Specifically, the proposed network consists of three main components: a feature pre-extractor, a multi-scale module, and a local attention module. The feature pre-extractor is utilized to pre-extract middle-level features, the multi-scale module to fuse features with different receptive fields, which reduces the susceptibility of deeper convolution towards occlusion and variant pose, while the local attention module can guide the network to focus on local salient features, which releases the interference of occlusion and non-frontal pose problems on FER in the wild. Extensive experiments demonstrate that the proposed MA-Net achieves the state-of-the-art results on several in-the-wild FER benchmarks: CAER-S, AffectNet-7, AffectNet-8, RAFDB, and SFEW with accuracies of 88.42%, 64.53%, 60.29%, 88.40%, and 59.40% respectively.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Facial Expression Recognition (FER) AffectNet MA-Net Accuracy (7 emotion) 64.53 # 19
Accuracy (8 emotion) 60.29 # 19
Facial Expression Recognition (FER) RAF-DB MA-Net Overall Accuracy 88.36 # 14

Methods


No methods listed for this paper. Add relevant methods here