We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner.
This alleviates the data bottleneck, which is one of the major concern for supervised methods.
Ranked #19 on Weakly-supervised 3D Human Pose Estimation on Human3.6M
Non-Rigid Structure from Motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences.
All current non-rigid structure from motion (NRSfM) algorithms are limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle.
One challenge that remains open in 3D deep learning is how to efficiently represent 3D data to feed deep networks.
A common strategy in dictionary learning to encourage generalization is to allow for linear combinations of dictionary elements.
Conventional methods of 3D object generative modeling learn volumetric predictions using deep networks with 3D convolutional operations, which are direct analogies to classical 2D ones.
In this paper we exploit natural sentential descriptions of RGB-D scenes in order to improve 3D semantic parsing.
In this paper, we tackle the problem of retrieving videos using complex natural language queries.