General Facial Representation Learning in a Visual-Linguistic Manner

How to learn a universal facial representation that boosts all face analysis tasks? This paper takes one step toward this goal. In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner. On one hand, the framework involves a contrastive loss to learn high-level semantic meaning from image-text pairs. On the other hand, we propose exploring low-level information simultaneously to further enhance the face representation, by adding a masked image modeling. We perform pre-training on LAION-FACE, a dataset containing large amount of face image-text pairs, and evaluate the representation capability on multiple downstream tasks. We show that FaRL achieves better transfer performance compared with previous pre-trained models. We also verify its superiority in the low-data regime. More importantly, our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


 Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Face Alignment 300W FaRL-B (epoch 64) NME_inter-ocular (%, Full) 2.88 # 3
NME_inter-ocular (%, Common) 2.50 # 1
NME_inter-ocular (%, Challenge) 4.42 # 4
NME_inter-pupil (%, Full) 4.05 # 4
NME_inter-pupil (%, Common) 3.46 # 4
NME_inter-pupil (%, Challenge) 6.38 # 3
Face Alignment 300W FaRL-B (epoch 16) NME_inter-ocular (%, Full) 2.93 # 5
NME_inter-ocular (%, Common) 2.56 # 5
NME_inter-ocular (%, Challenge) 4.45 # 5
NME_inter-pupil (%, Full) 4.11 # 7
NME_inter-pupil (%, Common) 3.53 # 7
NME_inter-pupil (%, Challenge) 6.42 # 5
Face Alignment AFLW-19 FaRL-B (epoch 16) NME_diag (%, Full) 0.943 # 2
NME_diag (%, Frontal) 0.821 # 2
NME_box (%, Full) 1.334 # 2
AUC_box@0.07 (%, Full) 81.3 # 2
Face Parsing CelebAMask-HQ FaRL-B Mean F1 89.56 # 1
Face Parsing LaPa FaRL-B Mean F1 93.88 # 1
Face Alignment WFW (Extra Data) FaRL-B (epoch 16) NME (inter-ocular) 3.96 # 2
AUC@10 (inter-ocular) 61.16 # 2
FR@10 (inter-ocular) 1.76 # 2

Methods


No methods listed for this paper. Add relevant methods here