no code implementations • 6 Mar 2023 • Raphael Avalos, Florent Delgrange, Ann Nowé, Guillermo A. Pérez, Diederik M. Roijers
Keeping a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is also intractable.
1 code implementation • 30 Jan 2023 • Alexandra Cimpean, Timothy Verstraeten, Lander Willem, Niel Hens, Ann Nowé, Pieter Libin
$m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility, enabling experts to inspect this small set of alternative strategies, along with their quantified uncertainty.
no code implementations • 30 Jan 2023 • Hélène Plisnier, Denis Steckelmacher, Jeroen Willems, Bruno Depraetere, Ann Nowé
Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession.
1 code implementation • 18 Jan 2023 • Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva
Finally, we introduce a bound that characterizes the maximum utility loss (with respect to the optimal solution) incurred by the partial solutions computed by our method throughout learning.
2 code implementations • Benelux Conference on Artificial Intelligence BNAIC/BeNeLearn 2022 • Lucas N. Alegre, Florian Felten, El-Ghazali Talbi, Grégoire Danoy, Ann Nowé, Ana L. C. Bazzan, Bruno C. da Silva
We introduce MO-Gym, an extensible library containing a diverse set of multi-objective reinforcement learning environments.
no code implementations • 8 Jul 2022 • Glenn Ceusters, Luis Ramirez Camargo, Rüdiger Franke, Ann Nowé, Maarten Messagie
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems.
no code implementations • 11 Apr 2022 • Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Rădulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin
As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models.
no code implementations • 11 Apr 2022 • Mathieu Reymond, Eugenio Bargiacchi, Ann Nowé
In multi-objective optimization, learning all the policies that reach Pareto-efficient solutions is an expensive process.
no code implementations • 23 Dec 2021 • Raphaël Avalos, Mathieu Reymond, Ann Nowé, Diederik M. Roijers
Moreover, when the number of agents becomes large, LAN uses significantly fewer parameters than QPLEX or even QMIX.
1 code implementation • 17 Dec 2021 • Florent Delgrange, Ann Nowé, Guillermo A. Pérez
Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees.
1 code implementation • 17 Nov 2021 • Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu
We consider preference communication in two-player multi-objective normal-form games.
1 code implementation • 25 Jun 2021 • Axel Abels, Tom Lenaerts, Vito Trianni, Ann Nowé
Quite some real-world problems can be formulated as decision-making problems wherein one must repeatedly make an appropriate choice from a set of alternatives.
1 code implementation • 10 Jun 2021 • Youri Coppens, Denis Steckelmacher, Catholijn M. Jonker, Ann Nowé
Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment.
no code implementations • 20 Apr 2021 • Glenn Ceusters, Román Cantú Rodríguez, Alberte Bouso García, Rüdiger Franke, Geert Deconinck, Lieve Helsen, Ann Nowé, Maarten Messagie, Luis Ramirez Camargo
Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints.
1 code implementation • 17 Mar 2021 • Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers
Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives.
1 code implementation • 14 Nov 2020 • Roxana Rădulescu, Timothy Verstraeten, Yijie Zhang, Patrick Mannion, Diederik M. Roijers, Ann Nowé
We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i. e., learning while considering the impact of one's policy when anticipating the opponent's learning step).
1 code implementation • 30 Mar 2020 • Pieter Libin, Arno Moonens, Timothy Verstraeten, Fabian Perez-Sanjines, Niel Hens, Philippe Lemey, Ann Nowé
For this reason, we investigate a deep reinforcement learning approach to automatically learn prevention strategies in the context of pandemic influenza.
no code implementations • 17 Jan 2020 • Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé
In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions.
no code implementations • 15 Jan 2020 • Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers, Ann Nowé
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping, for efficient learning in multi-agent Markov decision processes.
Model-based Reinforcement Learning
Multi-agent Reinforcement Learning
+3
1 code implementation • 22 Nov 2019 • Timothy Verstraeten, Pieter JK Libin, Ann Nowé
In many settings, as for example wind farms, multiple machines are instantiated to perform the same task, which is called a fleet.
1 code implementation • 22 Nov 2019 • Timothy Verstraeten, Eugenio Bargiacchi, Pieter JK Libin, Jan Helsen, Diederik M. Roijers, Ann Nowé
In this task, wind turbines must coordinate their alignments with respect to the incoming wind vector in order to optimize power production.
no code implementations • 30 Sep 2019 • Felipe Gomez Marulanda, Pieter Libin, Timothy Verstraeten, Ann Nowé
In general, our approach outperforms PointNet on every family of 3D geometries on which the models were tested.
no code implementations • 6 Sep 2019 • Roxana Rădulescu, Patrick Mannion, Diederik M. Roijers, Ann Nowé
We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied.
no code implementations • 18 Jul 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik Roijers, Ann Nowé
After training in the lab, the robot should be able to get by without the expensive equipment that used to be available to it, and yet still be guaranteed to perform well on the field.
1 code implementation • 11 Mar 2019 • Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé
We argue that actor-critic algorithms are limited by their need for an on-policy critic.
no code implementations • 7 Feb 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé
In this paper, we propose an elegant solution, the Actor-Advisor architecture, in which a Policy Gradient actor learns from unbiased Monte-Carlo returns, while being shaped (or advised) by the Softmax policy arising from an off-policy critic.
3 code implementations • 20 Sep 2018 • Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher
In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required.
Multi-Objective Reinforcement Learning
reinforcement-learning
no code implementations • 13 Aug 2018 • Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé
Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster.
no code implementations • ICML 2018 • Eugenio Bargiacchi, Timothy Verstraeten, Diederik Roijers, Ann Nowé, Hado Hasselt
Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems.
1 code implementation • 21 Feb 2018 • Luisa M. Zintgraf, Diederik M. Roijers, Sjoerd Linders, Catholijn M. Jonker, Ann Nowé
We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering.
no code implementations • 16 Nov 2017 • Pieter Libin, Timothy Verstraeten, Diederik M. Roijers, Jelena Grujic, Kristof Theys, Philippe Lemey, Ann Nowé
We evaluate these algorithms in a realistic experimental setting and demonstrate that it is possible to identify the optimal strategy using only a limited number of model evaluations, i. e., 2-to-3 times faster compared to the uniform sampling method, the predominant technique used for epidemiological decision making in the literature.
no code implementations • 22 Aug 2017 • Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability.
no code implementations • 28 Feb 2017 • Roxana Rădulescu, Peter Vrancx, Ann Nowé
Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 16 Dec 2015 • Sofie De Clercq, Steven Schockaert, Martine De Cock, Ann Nowé
Since the introduction of the stable marriage problem (SMP) by Gale and Shapley (1962), several variants and extensions have been investigated.
no code implementations • 28 Feb 2013 • Sofie De Clercq, Steven Schockaert, Martine De Cock, Ann Nowé
Our encoding can easily be extended and adapted to the needs of specific applications.