Object Discovery is the task of identifying previously unseen objects.
Data efficiency and robustness to task-irrelevant perturbations are long-standing challenges for deep reinforcement learning algorithms.
The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence.
Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.
This paper tests the hypothesis that modeling a scene in terms of entities and their local interactions, as opposed to modeling the scene globally, provides a significant benefit in generalizing to physical tasks in a combinatorial space the learner has not encountered before.
A range of methods with suitable inductive biases exist to learn interpretable object-centric representations of images without supervision.
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning.
Ranked #1 on Image Generation on Multi-dSprites
Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.
We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models.