This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.
Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects.
In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing.
To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations.
In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input.
Ranked #1 on 3D-Aware Image Synthesis on CelebAMask-HQ
The pixel-level smoothness forces the model to generate invariant results under adversarial perturbations.
Moreover, to enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
However, due to the lack of Deepfakes datasets with large variance in appearance, which can be hardly produced by recent identity swapping methods, the detection algorithm may fail in this situation.