1 code implementation • 18 Dec 2023 • Merlijn Krale, Thiago D. Simão, Jana Tumova, Nils Jansen
Partial observability and uncertainty are common problems in sequential decision-making that particularly impede the use of formal models such as Markov decision processes (MDPs).
no code implementations • 18 Dec 2023 • Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges, Nils Jansen
However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents.
no code implementations • 26 Jul 2023 • Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan
Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses.
1 code implementation • 13 May 2023 • Patrick Wienhöft, Marnix Suilen, Thiago D. Simão, Clemens Dubslaff, Christel Baier, Nils Jansen
In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated.
1 code implementation • 14 Mar 2023 • Merlijn Krale, Thiago D. Simão, Nils Jansen
In these models, actions consist of two components: a control action that affects the environment, and a measurement action that affects what the agent can observe.
no code implementations • 10 Mar 2023 • Thom Badings, Thiago D. Simão, Marnix Suilen, Nils Jansen
In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty.
no code implementations • 12 Jan 2023 • Thiago D. Simão, Marnix Suilen, Nils Jansen
In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies.
1 code implementation • 31 May 2022 • Marnix Suilen, Thiago D. Simão, David Parker, Nils Jansen
Markov decision processes (MDPs) are formal models commonly used in sequential decision-making.
no code implementations • 11 Sep 2019 • Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes
Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance.