The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)."
However, prevailing optimization techniques are not designed for strictly-incremental online updates.
The ability to infer map variables and estimate pose is crucial to the operation of autonomous mobile robots.
We consider an autonomous exploration problem in which a range-sensing mobile robot is tasked with accurately mapping the landmarks in an a priori unknown environment efficiently in real-time; it must choose sensing actions that both curb localization uncertainty and achieve information gain.
Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.
We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm.