no code implementations • 11 Feb 2024 • Jeongyeol Kwon, Liu Yang, Robert Nowak, Josiah Hanna
Then, our main contributions are two-fold: (a) we demonstrate that the performance of reinforcement learning is strongly correlated with the prediction accuracy of future observations in partially observable environments, and (b) our approach can significantly improve the overall end-to-end approach by preventing high-variance noisy signals from reinforcement learning objectives to influence the representation learning.
no code implementations • 29 Jan 2023 • Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, Robert Nowak
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits.
no code implementations • ICML 2020 • Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone
In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.
no code implementations • NeurIPS 2020 • Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning.
no code implementations • 26 Sep 2013 • Patrice Perny, Paul Weng, Judy Goldsmith, Josiah Hanna
This paper is devoted to fair optimization in Multiobjective Markov Decision Processes (MOMDPs).