We test the effectiveness of our representation on the human image harmonization task by predicting shading that is coherent with a given background image.
Appearance of dressed humans undergoes a complex geometric transformation induced not only by the static pose but also by its dynamics, i. e., there exists a number of cloth geometric configurations given a pose depending on the way it has moved.
Synthesizing dynamic appearances of humans in motion plays a central role in applications such as AR/VR and video editing.
Self-contacts, such as when hands touch each other or the torso or the head, are important attributes of human body language and dynamics, yet existing methods do not model or preserve these contacts.
To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.
We present Cascaded Primitive Fitting Networks (CPFN) that relies on an adaptive patch sampling network to assemble detection results of global and local primitive detection networks.
A long-standing goal in computer vision is to capture, model, and realistically synthesize human behavior.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation, motion completion from partial observations, and motion synthesis from sparse key-frames.
Ranked #4 on motion synthesis on LaFAN1
A vital task of the wider digital human effort is the creation of realistic garments on digital avatars, both in the form of characteristic fold patterns and wrinkles in static frames as well as richness of garment dynamics under avatars' motion.
We present an interactive approach to synthesizing realistic variations in facial hair in images, ranging from subtle edits to existing hair to the addition of complex and challenging hair in images of clean-shaven subjects.
We present a generative model to synthesize 3D shapes as sets of handles -- lightweight proxies that approximate the original 3D shape -- for applications in interactive editing, shape parsing, and building compact 3D representations.
The 3D structure points produced by our method encode the shape structure intrinsically and exhibit semantic consistency across all the shape instances with similar structures.
We show that methods trained on our dataset consistently perform well when tested on other datasets.
Ranked #6 on 3D Hand Pose Estimation on FreiHAND
Reconstructing 3D shapes from single-view images has been a long-standing research problem.
Ranked #1 on Single-View 3D Reconstruction on ShapeNetCore
Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we introduce 3DN, an end-to-end network that deforms the source model to resemble the target.
Garment transfer is a challenging task that requires (i) disentangling the features of the clothing from the body pose and shape and (ii) realistic synthesis of the garment texture on the new body.
Ranked #1 on Virtual Try-on on FashionIQ (using extra training data)
Designing real and virtual garments is becoming extremely demanding with rapidly changing fashion trends and increasing need for synthesizing realistic dressed digital humans for various applications.
A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video.
The proposed end-to-end DNN learns to directly infer a set of plane parameters and corresponding plane segmentation masks from a single RGB image.
Ranked #2 on Plane Instance Segmentation on NYU Depth v2
We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting.
Human shape estimation is an important task for video editing, animation and fashion industry.
Ranked #2 on 3D Human Pose Estimation on Surreal (using extra training data)
To train such a network, we generate a massive dataset of synthetic faces with dense labels using renderings of a morphable face model with variations in pose, expressions, lighting, and occlusions.
The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data.
We propose an end-to-end network architecture that replicates the forward image formation process to accomplish this task.
We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analysis problems such as point correspondences, semantic segmentation, affordance prediction, and shape-to-scan matching.
Instead of taking a 'blank slate' approach, we first explicitly infer the parts of the geometry visible both in the input and novel views and then re-cast the remaining synthesis problem as image completion.
Due to the abundance of 2D product images from the Internet, developing efficient and scalable algorithms to recover the missing depth information is central to many applications.
We present an end-to-end system for reconstructing complete watertight and textured models of moving subjects such as clothed humans and animals, using only three or four handheld sensors.