no code implementations • 20 Sep 2023 • Pierre Liotet
In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions.
no code implementations • 11 May 2022 • Pierre Liotet, Davide Maran, Lorenzo Bisi, Marcello Restelli
When the agent's observations or interactions are delayed, classic reinforcement learning tools usually fail.
no code implementations • 13 Dec 2021 • Pierre Liotet, Francesco Vidaich, Alberto Maria Metelli, Marcello Restelli
This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias.