no code implementations • 20 Nov 2023 • Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar
In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model.
1 code implementation • 13 Mar 2023 • Lakshmi Mandal, Shalabh Bhatnagar
We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm.