Semi-Supervised Learning of Multi-Object 3D Scene Representations

28 Sep 2020  ·  Cathrin Elich, Martin R. Oswald, Marc Pollefeys, Joerg Stueckler ·

Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose a novel approach for learning multi-object 3D scene representations from images. A recurrent encoder regresses a latent representation of 3D shapes, poses and texture of each object from an input RGB image. The 3D shapes are represented continuously in function-space as signed distance functions (SDF) which we efficiently pre-train from example shapes. By differentiable rendering, we train our model to decompose scenes self-supervised from RGB-D images. Our approach learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture properties from a single view. In experiments, we evaluate the accuracy of our model in inferring the 3D scene layout and demonstrate the capabilities of the generative 3D scene model.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here