Multi-task head pose estimation in-the-wild

22 Dec 2020  ·  Roberto Valle, José Miguel Buenaposada, Luis Baumela ·

We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.

PDF Abstract

Results from the Paper


 Ranked #1 on Face Alignment on COFW (Recall at 80% precision (Landmarks Visibility) metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Pose Estimation 300W (Full) MNN MAE mean (º) 1.56 # 2
Head Pose Estimation AFLW MNN MAE 3.22 # 1
Head Pose Estimation AFLW2000 MNN MAE 3.83 # 8
Face Alignment AFLW2000 MNN+ORB (Reannotated) Error rate 2.58 # 1
Face Alignment AFLW2000-3D MNN+OR (reannotated) Mean NME 2.58% # 1
Head Pose Estimation BIWI MNN MAE (trained with other data) 3.66 # 5
Face Alignment COFW MNN (Inter-pupil Norm) NME (inter-pupil) 5.65% # 8
Face Alignment COFW MNN+OR (Inter-pupils Norm) NME (inter-pupil) 5.04% # 3
Recall at 80% precision (Landmarks Visibility) 72.12 # 1

Methods


No methods listed for this paper. Add relevant methods here