Learning Object Placement by Inpainting for Compositional Data Augmentation

We study the problem of common sense placement of visual objects in an image.  This involves multiple aspects of visual recognition: the instance segmentation of the scene, 3D layout, and common knowledge of how objects are placed and where objects are moving in the 3D scene. This seemingly simple task is difficult for current learning-based approaches because of the lack of labeled training of foreground objects paired with cleaned background scenes. We propose a self-learning framework that automatically generates the necessary training data without any manual labeling by detecting, cutting, and inpainting objects from an image.  We propose a PlaceNet that predicts a diverse distribution of common sense locations when given a foreground object and a background scene. We show one practical use of our object placement network for augmenting training datasets by recomposition of object-scene with a key property of contextual relationship preservation. We demonstrate improvement of object detection and instance segmentation performance on both Cityscape and datasets.  We also show that the learned representation of our PlaceNet displays strong discriminative power in image retrieval and classification.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here