We present a framework that uses GAN-augmented images to complement certain specific attributes, usually underrepresented, for machine learning model training.
In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
Recently, significant progress has been made in single-view depth estimation thanks to increasingly large and diverse depth datasets.
Existing video-based human pose estimation methods extensively apply large networks onto every frame in the video to localize body joints, which suffer high computational cost and hardly meet the low-latency requirement in realistic applications.
Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets.
We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN).
In particular, instead of only performing scene completion from each individual scan, our approach alternates between relative pose estimation and scene completion.
We present a deep learning-based volumetric capture approach for performance capture using a passive and highly sparse multi-view capture system.
We show a principled way to train this model by combining discriminator losses for both a 3D object arrangement representation and a 2D image-based representation.
Existing methods define semantic keypoints separately for each category with a fixed number of semantic labels in fixed indices.
Ranked #2 on Keypoint Detection on Pascal3D+
In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image.
We propose AutoScaler, a scale-attention network to explicitly optimize this trade-off in visual correspondence tasks.