1 code implementation • 30 Oct 2023 • Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du
Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be important in sequential decision-making problems.
1 code implementation • 8 Mar 2023 • Zhaoyi Zhou, Zaiwei Chen, Yiheng Lin, Adam Wierman
The algorithm is scalable since each agent uses only local information and does not need access to the global state.