Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

Heatmap regression methods have dominated face alignment area in recent years while they ignore the inherent relation between different landmarks. In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism. The subpixel coordinate of each landmark is predicted independently based on the aggregated feature. Moreover, a coarse-to-fine framework is further introduced to incorporate with the SLPT, which enables the initial landmarks to gradually converge to the target facial landmarks using fine-grained features from dynamically resized local patches. Extensive experiments carried out on three popular benchmarks, including WFLW, 300W and COFW, demonstrate that the proposed method works at the state-of-the-art level with much less computational complexity by learning the inherent relation between facial landmarks. The code is available at the project website.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Face Alignment 300W SLPT NME_inter-ocular (%, Full) 3.17 # 16
NME_inter-ocular (%, Common) 2.75 # 15
NME_inter-ocular (%, Challenge) 4.90 # 20
Face Alignment COFW SLPT NME (inter-ocular) 3.32 # 9
NME (inter-pupil) 4.79 # 2
Face Alignment COFW-68 SPLT NME (inter-ocular) 4.10 # 2
Face Alignment WFLW SLPT NME (inter-ocular) 4.14 # 9
AUC@10 (inter-ocular) 59.5 # 10
FR@10 (inter-ocular) 2.76 # 8
Face Alignment WFW (Extra Data) SPLT NME (inter-ocular) 4.14 # 5
AUC@10 (inter-ocular) 59.50 # 6
FR@10 (inter-ocular) 2.76 # 6

Methods