no code implementations • 14 Nov 2023 • Nicholas E. Corrado, Josiah P. Hanna
We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms.
1 code implementation • NeurIPS 2023 • Brahma S. Pavse, Josiah P. Hanna
Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE.
no code implementations • 27 Oct 2023 • Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna
In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data.
1 code implementation • 26 Oct 2023 • Nicholas E. Corrado, Josiah P. Hanna
Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency.
no code implementations • 2 Jun 2023 • Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna
This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.
no code implementations • 16 Dec 2022 • Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor
In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping.
no code implementations • 14 Dec 2022 • Brahma S. Pavse, Josiah P. Hanna
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$.
1 code implementation • 20 Sep 2022 • Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna
The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w. r. t the baseline policy during training, and \textbf{(b)} eventually surpassing the baseline performance.
1 code implementation • 12 Jul 2022 • Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training.
no code implementations • 28 May 2022 • Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella
Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".
no code implementations • 9 Mar 2022 • Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak
This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs).
1 code implementation • 29 Nov 2021 • Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy.
1 code implementation • ICML Workshop URL 2021 • Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters.
1 code implementation • 18 Jul 2020 • Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht
Authentication and key agreement are decided based on the agents' observed behaviors during the interaction.
1 code implementation • 23 Dec 2019 • James Ault, Josiah P. Hanna, Guni Sharon
Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human.
no code implementations • 18 Jun 2019 • Brahma S. Pavse, Faraz Torabi, Josiah P. Hanna, Garrett Warnell, Peter Stone
Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch.
1 code implementation • 4 Jun 2018 • Josiah P. Hanna, Scott Niekum, Peter Stone
We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.
1 code implementation • ICML 2017 • Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum
The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.
no code implementations • 20 Jun 2016 • Josiah P. Hanna, Peter Stone, Scott Niekum
In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.