This project leverages advances in multi-agent reinforcement learning (MARL) to improve the efficiency and flexibility of order-picking systems for commercial warehouses.
Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles.
These belief estimates are combined with our solution for the fully observable case to compute an agent's optimal policy under partial observability in open ad hoc teamwork.
Equilibrium selection in multi-agent games refers to the problem of selecting a Pareto-optimal equilibrium.
3 code implementations • 2 Aug 2022 • Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho, Filippos Christianos, Mhairi Dunion, Elliot Fosong, Samuel Garcin, Shangmin Guo, Balint Gyevnar, Trevor McInroe, Georgios Papoudakis, Arrasy Rahman, Lukas Schäfer, Massimiliano Tamborski, Giuseppe Vecchio, Cheng Wang, Stefano V. Albrecht
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning.
Early approaches address the AHT challenge by training the learner with a diverse set of handcrafted teammate policies, usually designed based on an expert's domain knowledge about the policies the learner may encounter.
We propose the novel few-shot teamwork (FST) problem, where skilled agents trained in a team to complete one task are combined with skilled agents from different tasks, and together must learn to adapt to an unseen but related task.
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training.
We show that a team of agents is able to adapt to novel tasks when provided with task embeddings.
A recent goal recognition method for autonomous vehicles, GRIT, has been shown to be fast, accurate, interpretable and verifiable.
Also, we find that levels in HKSL's hierarchy can learn to specialize in long- or short-term consequences of agent actions, thereby providing the downstream control policy with more informative representations.
Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination.
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy.
Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens.
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters.
Researchers are using deep learning models to explore the emergence of language in various language games, where agents interact and develop an emergent language to solve tasks.
As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified.
Autonomous Driving Robotics Multiagent Systems
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents.
In this paper, we present PILOT -- a planning framework that comprises an imitation neural network followed by an efficient optimiser that actively rectifies the network's plan, guaranteeing fulfilment of safety and comfort requirements.
Authentication and key agreement are decided based on the agents' observed behaviors during the interaction.
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with teammates without prior coordination mechanisms, including joint training.
Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution.
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult.
Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards.
The ability to predict the intentions and driving trajectories of other vehicles is a key problem for autonomous driving.
Generative Adversarial Networks (GANs) are a type of generative model which have received much attention due to their ability to model complex real-world data.
Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents.
In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems.
In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen.
To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents.
Belief filtering in DBNs is the task of inferring the belief state (i. e. the probability distribution over process states) based on incomplete and uncertain observations.
The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour.
Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains.
Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised---its possible states and actions and their causal structure---is known in advance and does not change during learning.
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents.
The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents.
Based on this model, we derive a solution, called Harsanyi-Bellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control.
Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.