no code implementations • 4 Jun 2023 • Miguel Suau, Matthijs T. J. Spaan, Frans A. Oliehoek
In this paper, we provide a mathematical characterization of this phenomenon, which we refer to as policy confounding, and show, through a series of examples, when and how it occurs in practice.
1 code implementation • 1 Jul 2022 • Miguel Suau, Jinke He, Mustafa Mert Çelikok, Matthijs T. J. Spaan, Frans A. Oliehoek
Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning.
no code implementations • 3 Feb 2022 • Miguel Suau, Jinke He, Matthijs T. J. Spaan, Frans A. Oliehoek
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL).
1 code implementation • 27 Jan 2022 • Jinke He, Miguel Suau, Hendrik Baier, Michael Kaisers, Frans A. Oliehoek
To plan reliably and efficiently while the approximate simulator is learning, we develop a method that adaptively decides which simulator to use for every simulation, based on a statistic that measures the accuracy of the approximate simulator.
no code implementations • 11 Nov 2021 • Miguel Suau, Alexandros Agapitos, David Lynch, Derek Farrell, Mingqi Zhou, Aleksandar Milenovic
The explosion in mobile data traffic together with the ever-increasing expectations for higher quality of service call for the development of AI algorithms for wireless network optimization.
1 code implementation • NeurIPS 2020 • Jinke He, Miguel Suau, Frans A. Oliehoek
In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods.
1 code implementation • 18 Nov 2019 • Miguel Suau, Jinke He, Elena Congeduti, Rolf A. N. Starre, Aleksander Czechowski, Frans A. Oliehoek
Due to its perceptual limitations, an agent may have too little information about the state of the environment to act optimally.