GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

20 Apr 2021  ·  Martin Engelcke, Oiwi Parker Jones, Ingmar Posner ·

Advances in object-centric generative models (OCGMs) have culminated in the development of a broad range of methods for unsupervised object segmentation and interpretable object-centric scene generation. These methods, however, are limited to simulated and real-world datasets with limited visual complexity... Moreover, object representations are often inferred using RNNs which do not scale well to large images or iterative refinement which avoids imposing an unnatural ordering on objects in an image but requires the a priori initialisation of a fixed number of object representations. In contrast to established paradigms, this work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic, non-parametric stick-breaking process. Similar to iterative refinement, this clustering procedure also leads to randomly ordered object representations, but without the need of initialising a fixed number of clusters a priori. This is used to develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement. We show that GENESIS-V2 outperforms previous methods for unsupervised image segmentation and object-centric scene generation on established synthetic datasets as well as more complex real-world datasets. read more

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Unsupervised Object Segmentation ObjectsRoom GENESIS-V2 ARI-FG 0.84 # 1
Unsupervised Object Segmentation ObjectsRoom GENESIS ARI-FG 0.63 # 3
Unsupervised Object Segmentation ObjectsRoom SlotAttention ARI-FG 0.79 # 2
Unsupervised Object Segmentation ObjectsRoom MONET-G ARI-FG 0.54 # 4
Image Generation ObjectsRoom MONET-G FID 205.7 # 3
Image Generation ObjectsRoom GENESIS FID 62.8 # 2
Image Generation ObjectsRoom GENESIS-V2 FID 52.6 # 1
Unsupervised Object Segmentation ShapeStacks GENESIS-V2 ARI-FG 0.81 # 1
Image Generation ShapeStacks MONET-G FID 197.8 # 3
Unsupervised Object Segmentation ShapeStacks GENESIS ARI-FG 0.70 # 3
Unsupervised Object Segmentation ShapeStacks SlotAttention ARI-FG 0.76 # 2
Unsupervised Object Segmentation ShapeStacks MONET-G ARI-FG 0.70 # 3
Image Generation ShapeStacks GENESIS-V2 FID 112.7 # 1
Image Generation ShapeStacks GENESIS FID 186.8 # 2
Unsupervised Object Segmentation Shelf&Tote Training Dataset SlotAttention ARI 0.03 # 4
Unsupervised Object Segmentation Shelf&Tote Training Dataset GENESIS-V2 ARI 0.55 # 1
Unsupervised Object Segmentation Shelf&Tote Training Dataset GENESIS ARI 0.04 # 3
Unsupervised Object Segmentation Shelf&Tote Training Dataset MONET-G ARI 0.11 # 2