2 code implementations • 6 Dec 2018 • Elad Hazan, Sham M. Kakade, Karan Singh, Abby Van Soest
Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do?