We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics the deep temporal architecture of human motor control.
Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent.
Bistable perception follows from observing a static, ambiguous, (visual) stimulus with two possible interpretations.
These memories are selectively attended to, using attention and gating blocks, to update agent's preferences.
Conversely, active inference reduces to Bayesian decision theory in the absence of ambiguity and relative risk, i. e., expected utility maximization.
Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters.
In this paper, we pursue the notion that this learnt behaviour can be a consequence of reward-free preference learning that ensures an appropriate trade-off between exploration and preference satisfaction.
Precisely, we show the conditions under which active inference produces the optimal solution to the Bellman equation--a formulation that underlies several approaches to model-based reinforcement learning and control.
In a more complex Animal-AI environment, our agents (using the same neural architecture) are able to simulate future state transitions and actions (i. e., plan), to evince reward-directed navigation - despite temporary suspension of visual input.
In this paper, we provide: 1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in RL; 2) an explicit discrete-state comparison between active inference and RL on an OpenAI gym baseline.