Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

We propose a neural rendering-based system that creates head avatars from a single photograph. Our approach models a person's appearance by decomposing it into two layers. The first layer is a pose-dependent coarse image that is synthesized by a small neural network. The second layer is defined by a pose-independent texture image that contains high-frequency details. The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views. We compare our system to analogous state-of-the-art systems in terms of visual quality and speed. The experiments show significant inference speedup over previous neural head avatar models for a given visual quality. We also report on a real-time smartphone-based implementation of our system.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Talking Head Generation VoxCeleb2 - 1-shot learning Fast Bi-layer Avatars (medium size) LPIPS 0.358 # 2
SSIM 0.508 # 2
CSIM 0.653 # 1
Normalized Pose Error 43.3 # 1
inference time (ms) 4 # 1
Talking Head Generation VoxCeleb2 - 1-shot learning First Order Motion Model (medium size) LPIPS 0.311 # 1
SSIM 0.553 # 1
CSIM 0.638 # 2
Normalized Pose Error 47.8 # 3
inference time (ms) 13 # 2
Talking Head Generation VoxCeleb2 - 1-shot learning Few-shot Vid-to-vid (medium size) LPIPS 0.368 # 3
SSIM 0.419 # 3
CSIM 0.604 # 3
Normalized Pose Error 46.1 # 2
inference time (ms) 22 # 3


No methods listed for this paper. Add relevant methods here