no code implementations • 26 Jul 2024 • Raphael Avalos, Eugenio Bargiacchi, Ann Nowé, Diederik M. Roijers, Frans A. Oliehoek
In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observability.
1 code implementation • 23 Jul 2024 • Florian Felten, Umut Ucak, Hicham Azmani, Gao Peng, Willem Röpke, Hendrik Baier, Patrick Mannion, Diederik M. Roijers, Jordan K. Terry, El-Ghazali Talbi, Grégoire Danoy, Ann Nowé, Roxana Rădulescu
Many challenging tasks such as managing traffic systems, electricity grids, or supply chains involve complex decision-making processes that must balance multiple conflicting objectives and coordinate the actions of various independent decision-makers (DMs).
no code implementations • 10 Jun 2024 • Jesse van Remmerden, Maurice Kenter, Diederik M. Roijers, Charalampos Andriotis, Yingqian Zhang, Zaharah Bukhsh
We evaluated MO-DCMAC using two utility functions, which use probability of collapse and cost as input.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 11 Feb 2024 • Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu
A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 5 Feb 2024 • Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Willem Röpke, Diederik M. Roijers
Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 19 Nov 2023 • Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek, Pradeep K. Murukannaiah
We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms.
1 code implementation • 9 May 2023 • Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers
For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker.
no code implementations • 6 Mar 2023 • Raphael Avalos, Florent Delgrange, Ann Nowé, Guillermo A. Pérez, Diederik M. Roijers
Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable.
3 code implementations • 18 Jan 2023 • Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva
Finally, we introduce a bound that characterizes the maximum utility loss (with respect to the optimal solution) incurred by the partial solutions computed by our method throughout learning.
no code implementations • 23 Nov 2022 • Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion
Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.
Multi-Objective Reinforcement Learning reinforcement-learning +1
1 code implementation • 8 Nov 2022 • Cláudia Fonseca Pinhão, Chris Eijgenstein, Iva Gornishka, Shayla Jansen, Diederik M. Roijers, Daan Bloembergen
Obstacles on the sidewalk often block the path, limiting passage and resulting in frustration and wasted time, especially for citizens and visitors who use assistive devices (wheelchairs, walkers, strollers, canes, etc).
no code implementations • 1 Jul 2022 • Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley, Patrick Mannion
In such settings a set of optimal policies must be computed.
no code implementations • 11 Apr 2022 • Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Rădulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin
As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models.
no code implementations • 23 Dec 2021 • Raphaël Avalos, Mathieu Reymond, Ann Nowé, Diederik M. Roijers
Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures.
1 code implementation • 17 Nov 2021 • Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu
We consider preference communication in two-player multi-objective normal-form games.
no code implementations • 2 Jun 2021 • Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley, Patrick Mannion
In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised.
1 code implementation • 17 Mar 2021 • Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers
Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives.
no code implementations • 1 Feb 2021 • Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy.
1 code implementation • 19 Jan 2021 • Timothy Verstraeten, Pieter-Jan Daems, Eugenio Bargiacchi, Diederik M. Roijers, Pieter J. K. Libin, Jan Helsen
This is a non-trivial optimization problem, as complex dependencies exist between the wind turbines.
1 code implementation • 14 Nov 2020 • Roxana Rădulescu, Timothy Verstraeten, Yijie Zhang, Patrick Mannion, Diederik M. Roijers, Ann Nowé
We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i. e., learning while considering the impact of one's policy when anticipating the opponent's learning step).
no code implementations • 4 May 2020 • Gongjin Lan, Jakub M. Tomczak, Diederik M. Roijers, A. E. Eiben
Evolutionary Algorithms (EA) on the other hand rely on search heuristics that typically do not depend on all previous data and can be done in constant time.
no code implementations • 21 Jan 2020 • Gongjin Lan, Matteo De Carlo, Fuda van Diggelen, Jakub M. Tomczak, Diederik M. Roijers, A. E. Eiben
We generalize the well-studied problem of gait learning in modular robots in two dimensions.
no code implementations • 17 Jan 2020 • Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé
In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions.
no code implementations • 15 Jan 2020 • Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers, Ann Nowé
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping, for efficient learning in multi-agent Markov decision processes.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +3
1 code implementation • 22 Nov 2019 • Timothy Verstraeten, Eugenio Bargiacchi, Pieter JK Libin, Jan Helsen, Diederik M. Roijers, Ann Nowé
In this task, wind turbines must coordinate their alignments with respect to the incoming wind vector in order to optimize power production.
1 code implementation • 6 Sep 2019 • Roxana Rădulescu, Patrick Mannion, Diederik M. Roijers, Ann Nowé
We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied.
1 code implementation • 11 Mar 2019 • Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé
We argue that actor-critic algorithms are limited by their need for an on-policy critic.
no code implementations • 7 Feb 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé
In this paper, we propose an elegant solution, the Actor-Advisor architecture, in which a Policy Gradient actor learns from unbiased Monte-Carlo returns, while being shaped (or advised) by the Softmax policy arising from an off-policy critic.
3 code implementations • 20 Sep 2018 • Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher
In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 13 Aug 2018 • Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé
Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster.
1 code implementation • 21 Feb 2018 • Luisa M. Zintgraf, Diederik M. Roijers, Sjoerd Linders, Catholijn M. Jonker, Ann Nowé
We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering.
no code implementations • 16 Nov 2017 • Pieter Libin, Timothy Verstraeten, Diederik M. Roijers, Jelena Grujic, Kristof Theys, Philippe Lemey, Ann Nowé
We evaluate these algorithms in a realistic experimental setting and demonstrate that it is possible to identify the optimal strategy using only a limited number of model evaluations, i. e., 2-to-3 times faster compared to the uniform sampling method, the predominant technique used for epidemiological decision making in the literature.
no code implementations • 22 Aug 2017 • Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability.
2 code implementations • 9 Oct 2016 • Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson
We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 22 Jun 2016 • Auke J. Wiggers, Frans A. Oliehoek, Diederik M. Roijers
Zero-sum stochastic games provide a rich model for competitive decision making.
no code implementations • 29 Nov 2015 • Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, Mathijs M. de Weerdt
In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value.