no code implementations • 28 Jul 2022 • Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour
Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence.
no code implementations • NeurIPS 2021 • Liad Erez, Tomer Koren
We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions.
no code implementations • 20 Jul 2021 • Liad Erez, Tomer Koren
We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions.