SPECTRA: Sparse Entity-centric Transitions
Learning an agent that interacts with objects is ubiquituous in many RL tasks. In most of them the agent's actions have sparse effects : only a small subset of objects in the visual scene will be affected by the action taken. We introduce SPECTRA, a model for learning slot-structured transitions from raw visual observations that embodies this sparsity assumption. Our model is composed of a perception module that decomposes the visual scene into a set of latent objects representations (i.e. slot-structured) and a transition module that predicts the next latent set slot-wise and in a sparse way. We show that learning a perception module jointly with a sparse slot-structured transition model not only biases the model towards more entity-centric perceptual groupings but also enables intrinsic exploration strategy that aims at maximizing the number of objects changed in the agent’s trajectory.
PDF Abstract