1 code implementation • 17 Oct 2024 • Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine
Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems.
no code implementations • 8 Jan 2024 • Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine
This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set.
1 code implementation • 20 Apr 2023 • Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine
In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor.
1 code implementation • 19 Apr 2023 • Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang
The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in AI research.
1 code implementation • 11 Oct 2022 • Chris Lu, Jakub Grudzien Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster
We refer to the immediate result as Learnt Policy Optimisation (LPO).
no code implementations • 2 Aug 2022 • Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang
The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community.
1 code implementation • 30 May 2022 • Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang
In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence.
no code implementations • 10 Feb 2022 • Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang
Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 7 Jan 2022 • Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster
In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO.
4 code implementations • 6 Oct 2021 • Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang
To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.
11 code implementations • ICLR 2022 • Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang
In this paper, we extend the theory of trust region learning to MARL.
1 code implementation • NeurIPS 2021 • Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang
In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.