Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

We propose a neural rendering-based system that creates head avatars from a single photograph. Our approach models a person's appearance by decomposing it into two layers... The first layer is a pose-dependent coarse image that is synthesized by a small neural network. The second layer is defined by a pose-independent texture image that contains high-frequency details. The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views. We compare our system to analogous state-of-the-art systems in terms of visual quality and speed. The experiments show significant inference speedup over previous neural head avatar models for a given visual quality. We also report on a real-time smartphone-based implementation of our system. read more

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Talking Head Generation VoxCeleb2 - 1-shot learning Few-shot Vid-to-vid (medium size) LPIPS 0.368 # 3
SSIM 0.419 # 3
CSIM 0.604 # 3
Normalized Pose Error 46.1 # 2
inference time (ms) 22 # 3
Talking Head Generation VoxCeleb2 - 1-shot learning First Order Motion Model (medium size) LPIPS 0.311 # 1
SSIM 0.553 # 1
CSIM 0.638 # 2
Normalized Pose Error 47.8 # 3
inference time (ms) 13 # 2
Talking Head Generation VoxCeleb2 - 1-shot learning Fast Bi-layer Avatars (medium size) LPIPS 0.358 # 2
SSIM 0.508 # 2
CSIM 0.653 # 1
Normalized Pose Error 43.3 # 1
inference time (ms) 4 # 1

Methods


No methods listed for this paper. Add relevant methods here