This work presents 3DPE, a practical tool that can efficiently edit a face image following given prompts, like reference images or text descriptions, in the 3D-aware manner.
We introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts.
Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.
We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image.
We propose to use pretraining to boost general image-to-image translation.
Ranked #1 on Sketch-to-Image Translation on COCO-Stuff
We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality.
A progressive propagation strategy with pseudo labels is also proposed to enhance DVP's performance on video propagation.
Although recent inpainting approaches have demonstrated significant improvements with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing regions.
With this principle, we present two conceptually simple and yet computational efficient modules, namely Cascade Prediction Fusion (CPF) and Pose Graph Neural Network (PGNN), to exploit underlying contextual information.
Ranked #10 on Pose Estimation on MPII Human Pose