Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network

CVPR 2018 · Daniel Merget, Matthias Rock, Gerhard Rigoll ·

While fully-convolutional neural networks are very strong at modeling local features, they fail to aggregate global context due to their constrained receptive field. Modern methods typically address the lack of global context by introducing cascades, pooling, or by fitting a statistical model. In this work, we propose a new approach that introduces global context into a fully-convolutional neural network directly. The key concept is an implicit kernel convolution within the network. The kernel convolution blurs the output of a local-context subnet, which is then refined by a global-context subnet using dilated convolutions. The kernel convolution is crucial for the convergence of the network because it smoothens the gradients and reduces overfitting. In a postprocessing step, a simple PCA-based 2D shape model is fitted to the network output in order to filter outliers. Our experiments demonstrate the effectiveness of our approach, outperforming several state-of-the-art methods in facial landmark detection.

PDF Abstract