no code implementations • 7 Jun 2023 • Anuj Mahajan, Amy Zhang
We focus on bisimulation metrics, which provide a powerful means for abstracting task relevant components of the observation and learning a succinct representation space for training the agent using reinforcement learning.
1 code implementation • 24 Mar 2023 • Kinal Mehta, Anuj Mahajan, Pawan Kumar
We present marl-jax, a multi-agent reinforcement learning software package for training and evaluating social generalization of the agents.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 15 Feb 2023 • Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson
In this paper, we show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.
1 code implementation • NeurIPS 2023 • Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson
In this work, we conduct new analysis demonstrating that SMAC lacks the stochasticity and partial observability to require complex *closed-loop* policies.
1 code implementation • 10 Dec 2022 • Kinal Mehta, Anuj Mahajan, Pawan Kumar
A reliable critic is central to on-policy actor-critic learning.
no code implementations • 31 Jan 2022 • Anuj Mahajan, Mikayel Samvelyan, Tarun Gupta, Benjamin Ellis, Mingfei Sun, Tim Rocktäschel, Shimon Whiteson
Specifically, we study generalization bounds under a linear dependence of the underlying dynamics on the agent capabilities, which can be seen as a generalization of Successor Features to MAS.
no code implementations • 27 Oct 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 27 Oct 2021 • Pascal Van Der Vaart, Anuj Mahajan, Shimon Whiteson
A challenge in multi-agent reinforcement learning is to be able to generalize over intractable state-action spaces.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +3
1 code implementation • 27 Jul 2021 • Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, Wojciech Marian Czarnecki
The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem.
no code implementations • 6 Jun 2021 • Mingfei Sun, Anuj Mahajan, Katja Hofmann, Shimon Whiteson
We present SoftDICE, which achieves state-of-the-art performance for imitation learning.
no code implementations • 31 May 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.
no code implementations • 6 Oct 2020 • Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities.
Multi-agent Reinforcement Learning reinforcement-learning +3
2 code implementations • ICLR 2021 • Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang
Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.
4 code implementations • NeurIPS 2019 • Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson
We specifically focus on QMIX [40], the current state-of-the-art in this domain.
1 code implementation • NeurIPS 2019 • Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson
This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps.
no code implementations • 9 Jun 2017 • Anuj Mahajan, Theja Tulabandhula
In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data.
no code implementations • NeurIPS 2015 • Happy Mittal, Anuj Mahajan, Vibhav G. Gogate, Parag Singla
Lifted inference rules exploit symmetries for fast reasoning in statistical rela-tional models.