Deep Region and Multi-Label Learning for Facial Action Unit Detection

CVPR 2016  ·  Kaili Zhao, Wen-Sheng Chu, Honggang Zhang ·

Region learning (RL) and multi-label learning (ML) have recently attracted increasing attentions in the field of facial Action Unit (AU) detection. Knowing that AUs are active on sparse facial regions, RL aims to identify these regions for a better specificity. On the other hand, a strong statistical evidence of AU correlations suggests that ML is a natural way to model the detection task. In this paper, we propose Deep Region and Multi-label Learning (DRML), a unified deep network that simultaneously addresses these two problems. One crucial aspect in DRML is a novel region layer that uses feed-forward functions to induce important facial regions, forcing the learned weights to capture structural information of the face. Our region layer serves as an alternative design between locally connected layers (i.e., confined kernels to individual pixels) and conventional convolution layers (i.e., shared kernels across an entire image). Unlike previous studies that solve RL and ML alternately, DRML by construction addresses both problems, allowing the two seemingly irrelevant problems to interact more directly. The complete network is end-to-end trainable, and automatically learns representations robust to variations inherent within a local region. Experiments on BP4D and DISFA benchmarks show that DRML performs the highest average F1-score and AUC within and across datasets in comparison with alternative methods.

PDF Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Facial Action Unit Detection BP4D DRML Average F1 48.3 # 8
Average AUC 56.0 # 3
Facial Action Unit Detection DISFA DRML Average F1 26.7 # 5
Average AUC 52.3 # 3


No methods listed for this paper. Add relevant methods here