no code implementations • 24 Apr 2024 • Sarah Keren, Chaimaa Essayeh, Stefano V. Albrecht, Thomas Mortsyn
The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges.
no code implementations • 22 Apr 2024 • Dongge Han, Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Peter Bell, Amos Storkey
We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics.
1 code implementation • 22 Apr 2024 • Mhairi Dunion, Stefano V. Albrecht
To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that achieves zero-shot generalisation to any single camera from the training set.
no code implementations • 8 Feb 2024 • Anton Kuznietsov, Balint Gyevnar, Cheng Wang, Steven Peters, Stefano V. Albrecht
One way to mitigate this challenge is to utilize explainable AI (XAI) techniques.
no code implementations • 5 Feb 2024 • Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
ICED generates levels using a variational autoencoder trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.
no code implementations • 16 Jan 2024 • Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith
Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation.
1 code implementation • 7 Dec 2023 • Sabrina McCallum, Max Taylor-Davies, Stefano V. Albrecht, Alessandro Suglia
Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning.
no code implementations • 9 Oct 2023 • Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Amos Storkey
Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm well matched to a real-world RL deployment process.
no code implementations • 5 Oct 2023 • Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training.
1 code implementation • 11 Jul 2023 • Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes.
1 code implementation • 9 May 2023 • Adam Michalski, Filippos Christianos, Stefano V. Albrecht
The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II.
no code implementations • 18 Apr 2023 • Alain Andres, Lukas Schäfer, Esther Villar-Rodriguez, Stefano V. Albrecht, Javier Del Ser
Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments.
1 code implementation • 23 Feb 2023 • Callum Rhys Tilbury, Filippos Christianos, Stefano V. Albrecht
This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete.
1 code implementation • 21 Feb 2023 • Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht
We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents.
no code implementations • 9 Feb 2023 • Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 7 Feb 2023 • Lukas Schäfer, Oliver Slumbers, Stephen Mcaleer, Yali Du, Stefano V. Albrecht, David Mguni
In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to seamlessly extend value-based MARL algorithms with ensembles of value functions.
no code implementations • 22 Dec 2022 • Aleksandar Krnjaic, Raul D. Steleac, Jonathan D. Thomas, Georgios Papoudakis, Lukas Schäfer, Andrew Wing Keung To, Kuan-Ho Lao, Murat Cubuktepe, Matthew Haley, Peter Börsting, Stefano V. Albrecht
We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse.
no code implementations • 26 Oct 2022 • Filippos Christianos, Peter Karkus, Boris Ivanovic, Stefano V. Albrecht, Marco Pavone
Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles.
1 code implementation • 11 Oct 2022 • Arrasy Rahman, Ignacio Carlucho, Niklas Höpner, Stefano V. Albrecht
These belief estimates are combined with our solution for the fully observable case to compute an agent's optimal policy under partial observability in open ad hoc teamwork.
1 code implementation • 28 Sep 2022 • Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria.
Multi-agent Reinforcement Learning reinforcement-learning +1
3 code implementations • 2 Aug 2022 • Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho, Filippos Christianos, Mhairi Dunion, Elliot Fosong, Samuel Garcin, Shangmin Guo, Balint Gyevnar, Trevor McInroe, Georgios Papoudakis, Arrasy Rahman, Lukas Schäfer, Massimiliano Tamborski, Giuseppe Vecchio, Cheng Wang, Stefano V. Albrecht
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning.
no code implementations • 28 Jul 2022 • Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, Stefano V. Albrecht
Early approaches address the AHT challenge by training the learner with a diverse set of handcrafted teammate policies, usually designed based on an expert's domain knowledge about the policies the learner may encounter.
no code implementations • 19 Jul 2022 • Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
We propose the novel few-shot teamwork (FST) problem, where skilled agents trained in a team to complete one task are combined with skilled agents from different tasks, and together must learn to adapt to an unseen but related task.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 12 Jul 2022 • Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training.
1 code implementation • 5 Jul 2022 • Lukas Schäfer, Filippos Christianos, Amos Storkey, Stefano V. Albrecht
We show that a team of agents is able to adapt to novel tasks when provided with task embeddings.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 28 Jun 2022 • Cillian Brewitt, Massimiliano Tamborski, Cheng Wang, Stefano V. Albrecht
Goal recognition (GR) involves inferring the goals of other vehicles, such as a certain junction exit, which can enable more accurate prediction of their future behaviour.
1 code implementation • 22 Jun 2022 • Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined.
no code implementations • 16 Feb 2022 • Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht
Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination.
1 code implementation • 29 Nov 2021 • Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy.
2 code implementations • 11 Oct 2021 • Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens.
1 code implementation • ICML Workshop URL 2021 • Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters.
1 code implementation • 7 Jun 2021 • Shangmin Guo, Yi Ren, Kory Mathewson, Simon Kirby, Stefano V. Albrecht, Kenny Smith
Researchers are using deep learning models to explore the emergence of language in various language games, where agents interact and develop an emergent language to solve tasks.
2 code implementations • 10 Mar 2021 • Cillian Brewitt, Balint Gyevnar, Stefano V. Albrecht
As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified.
Autonomous Driving Robotics Multiagent Systems
1 code implementation • 15 Feb 2021 • Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, Stefano V. Albrecht
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 1 Nov 2020 • Henry Pulver, Francisco Eiras, Ludovico Carozza, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
In this paper, we present PILOT -- a planning framework that comprises an imitation neural network followed by an efficient optimiser that actively rectifies the network's plan, guaranteeing fulfilment of safety and comfort requirements.
1 code implementation • 18 Jul 2020 • Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht
Authentication and key agreement are decided based on the agents' observed behaviors during the interaction.
1 code implementation • 18 Jun 2020 • Arrasy Rahman, Niklas Höpner, Filippos Christianos, Stefano V. Albrecht
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with teammates without prior coordination mechanisms, including joint training.
1 code implementation • NeurIPS 2021 • Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution.
8 code implementations • 14 Jun 2020 • Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult.
3 code implementations • NeurIPS 2020 • Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards.
2 code implementations • 6 Feb 2020 • Stefano V. Albrecht, Cillian Brewitt, John Wilhelm, Francisco Eiras, Mihai Dobre, Subramanian Ramamoorthy
The ability to predict the intentions and driving trajectories of other vehicles is a key problem for autonomous driving.
Robotics
no code implementations • 29 Jan 2020 • Georgios Papoudakis, Stefano V. Albrecht
Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system.
no code implementations • 30 Sep 2019 • Maciej Wiatrak, Stefano V. Albrecht, Andrew Nystrom
Generative Adversarial Networks (GANs) are a type of generative model which have received much attention due to their ability to model complex real-world data.
no code implementations • 23 Jul 2019 • Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents.
no code implementations • 22 Jul 2019 • Stefano V. Albrecht, Subramanian Ramamoorthy
In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems.
no code implementations • 15 Jul 2019 • Stefano V. Albrecht, Subramanian Ramamoorthy
In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen.
no code implementations • 10 Jul 2019 • Stefano V. Albrecht, Subramanian Ramamoorthy
Belief filtering in DBNs is the task of inferring the belief state (i. e. the probability distribution over process states) based on incomplete and uncertain observations.
no code implementations • 10 Jul 2019 • Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents.
no code implementations • 2 Jul 2019 • Stefano V. Albrecht, S. Ramamoorthy
The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour.
no code implementations • 11 Jun 2019 • Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, Stefano V. Albrecht
Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains.
no code implementations • 10 Jan 2018 • Craig Innes, Alex Lascarides, Stefano V. Albrecht, Subramanian Ramamoorthy, Benjamin Rosman
Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised---its possible states and actions and their causal structure---is known in advance and does not change during learning.
no code implementations • 23 Sep 2017 • Stefano V. Albrecht, Peter Stone
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents.
no code implementations • 28 Jul 2015 • Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents.
no code implementations • 3 Jun 2015 • Stefano V. Albrecht, Subramanian Ramamoorthy
Based on this model, we derive a solution, called Harsanyi-Bellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control.
no code implementations • 30 Jan 2014 • Stefano V. Albrecht, Subramanian Ramamoorthy
Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.