The conventional 3D generative adversarial models are not efficient in generating multi object scenes, they usually tend to generate either one object or generate fuzzy results of multiple objects.
Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis.
In this paper we propose a neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings.
In this paper we address the text to scene image generation problem.
To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details.
X-Fork architecture has a single discriminator and a single generator.
For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.
The visual world we sense, interpret and interact everyday is a complex composition of interleaved physical entities.
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning.