We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner.
This alleviates the data bottleneck, which is one of the major concern for supervised methods.
Non-Rigid Structure from Motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences.
Current non-rigid structure from motion (NRSfM) algorithms are mainly limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle.
All current non-rigid structure from motion (NRSfM) algorithms are limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle.
One challenge that remains open in 3D deep learning is how to efficiently represent 3D data to feed deep networks.
A common strategy in dictionary learning to encourage generalization is to allow for linear combinations of dictionary elements.
Conventional methods of 3D object generative modeling learn volumetric predictions using deep networks with 3D convolutional operations, which are direct analogies to classical 2D ones.
In this paper we exploit natural sentential descriptions of RGB-D scenes in order to improve 3D semantic parsing.