Search Results for author: Tomasz Kisielewski

Accidental exploration through value predictors

Infinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning.

Paper
Add Code

We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.