no code implementations • Tomasz Kisielewski, Damian Leśniak, Maia Pasek
Infinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning.
1 code implementation • 30 Jun 2019 • Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh
We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable.