Reconstructing the detailed geometric structure of a face from a given image
is a key to many computer vision and graphics applications, such as motion
capture and reenactment. The reconstruction task is challenging as human faces
vary extensively when considering expressions, poses, textures, and intrinsic
geometries. While many approaches tackle this complexity by using additional
data to reconstruct the face of a single subject, extracting facial surface
from a single image remains a difficult problem. As a result, single-image
based methods can usually provide only a rough estimate of the facial geometry.
In contrast, we propose to leverage the power of convolutional neural networks
to produce a highly detailed face reconstruction from a single image. For this
purpose, we introduce an end-to-end CNN framework which derives the shape in a
coarse-to-fine fashion. The proposed architecture is composed of two main
blocks, a network that recovers the coarse facial geometry (CoarseNet),
followed by a CNN that refines the facial features of that geometry (FineNet).
The proposed networks are connected by a novel layer which renders a depth
image given a mesh in 3D. Unlike object recognition and detection problems,
there are no suitable datasets for training CNNs to perform face geometry
reconstruction. Therefore, our training regime begins with a supervised phase,
based on synthetic images, followed by an unsupervised phase that uses only
unconstrained facial images. The accuracy and robustness of the proposed model
is demonstrated by both qualitative and quantitative evaluation tests.