no code implementations • NeurIPS 2010 • Gergely Neu, Andras Antos, András György, Csaba Szepesvári
We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.