Unsupervised Object Segmentation
19 papers with code • 9 benchmarks • 11 datasets
Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities.
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning.
Moreover, object representations are often inferred using RNNs which do not scale well to large images or iterative refinement which avoids imposing an unnatural ordering on objects in an image but requires the a priori initialisation of a fixed number of object representations.
The ability to decompose scenes into their object components is a desired property for autonomous agents, allowing them to reason and act in their surroundings.
To force the generator to learn a representation where the foreground layer corresponds to an object, we perturb the output of the generative model by introducing a random shift of both the foreground image and mask relative to the background.
The recent rise of unsupervised and self-supervised learning has dramatically reduced the dependency on labeled data, providing effective image representations for transfer to downstream vision tasks.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.