Less Suboptimal Learning and Control in Variational POMDPs

ICLR Workshop SSL-RL 2021 · Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer ·

A recently uncovered pitfall in learning generative models with amortised variational inference, the conditioning gap, questions common practices in model-based reinforcement learning. Withholding a part of the quantities that the true posterior depends on from the inference network leads to a biased generative model and an approximate posterior that underestimates uncertainty. We examine the effect of the conditioning gap on model-based reinforcement learning with variational world models. We study the effect in three settings with known dynamics, which enables us to compare to a near-optimal policy. Our finding is that the impact of the conditioning gap becomes severe in systems where the state is hard to estimate.

PDF Abstract