Self-Supervised 3D Face Reconstruction via Conditional Estimation

ICCV 2021  ·  Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh ·

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the estimation of different 3D facial parameters by taking their statistical dependency into account. Specifically, the estimation of any 3D facial parameter is not only conditioned on the given image, but also on the facial parameters that have already been derived. Moreover, the reflectance symmetry and consistency among the video frames are adopted to improve the disentanglement of facial parameters. Together with a novel strategy for incorporating the reflectance symmetry and consistency, CEST can be efficiently trained with in-the-wild video clips. Both qualitative and quantitative experiments demonstrate the effectiveness of CEST.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Face Reconstruction REALY CEST @nose 2.779 (±0.835) # 22
@mouth 1.448 (±0.406) # 6
@forehead 2.384 (±0.578) # 11
@cheek 1.456 (±0.485) # 11
all 2.017 # 15

Methods


No methods listed for this paper. Add relevant methods here