1 code implementation • 30 Jan 2023 • Alexandra Cimpean, Timothy Verstraeten, Lander Willem, Niel Hens, Ann Nowé, Pieter Libin
$m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility, enabling experts to inspect this small set of alternative strategies, along with their quantified uncertainty.