Enquiries concerning the underlying mechanisms and the emergent properties of a biological brain have a long history of theoretical postulates and experimental findings.
Discovery and learning of an underlying spatiotemporal hierarchy in sequential data is an important topic for machine learning.
In this paper, we pursue the notion that this learnt behaviour can be a consequence of reward-free preference learning that ensures an appropriate trade-off between exploration and preference satisfaction.
In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment even though planning often requires reasoning over multi-step timescales, with intermediate states either unnecessary, or worse, accumulating prediction error.
In a more complex Animal-AI environment, our agents (using the same neural architecture) are able to simulate future state transitions and actions (i. e., plan), to evince reward-directed navigation - despite temporary suspension of visual input.
We propose a novel neural network architecture, named the Global Workspace Network (GWN), which addresses the challenge of dynamic and unspecified uncertainties in multimodal data fusion.