no code implementations • 16 Jun 2023 • Qingshuang Sun, Denis Steckelmacher, Yuan YAO, Ann Nowé, Raphaël Avalos
Communication plays a vital role in multi-agent systems, fostering collaboration and coordination.
no code implementations • 30 Jan 2023 • Hélène Plisnier, Denis Steckelmacher, Jeroen Willems, Bruno Depraetere, Ann Nowé
Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession.
1 code implementation • 10 Jun 2021 • Youri Coppens, Denis Steckelmacher, Catholijn M. Jonker, Ann Nowé
Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment.
no code implementations • 1 Jan 2021 • Gregory Bonaert, Youri Coppens, Denis Steckelmacher, Ann Nowe
Our key contribution to improve explainability is introducing goal-based explanations, a new explanation mechanism where the agent produces goals and attempts to reach those goals one-by-one while maximizing the collected reward.
no code implementations • 18 Jul 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik Roijers, Ann Nowé
After training in the lab, the robot should be able to get by without the expensive equipment that used to be available to it, and yet still be guaranteed to perform well on the field.
1 code implementation • 11 Mar 2019 • Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé
We argue that actor-critic algorithms are limited by their need for an on-policy critic.
no code implementations • 7 Feb 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé
In this paper, we propose an elegant solution, the Actor-Advisor architecture, in which a Policy Gradient actor learns from unbiased Monte-Carlo returns, while being shaped (or advised) by the Softmax policy arising from an off-policy critic.
3 code implementations • 20 Sep 2018 • Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher
In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 13 Aug 2018 • Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé
Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster.
no code implementations • 22 Aug 2017 • Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability.
no code implementations • 17 Dec 2015 • Denis Steckelmacher, Peter Vrancx
This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures.