Search Results for author: Josiah P. Hanna

Found 19 papers, 10 papers with code

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

no code implementations14 Nov 2023 Nicholas E. Corrado, Josiah P. Hanna

We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms.

reinforcement-learning Reinforcement Learning (RL)

State-Action Similarity-Based Representations for Off-Policy Evaluation

1 code implementation NeurIPS 2023 Brahma S. Pavse, Josiah P. Hanna

Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE.

Off-policy evaluation Representation Learning

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

no code implementations27 Oct 2023 Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna

In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data.

Autonomous Driving D4RL +5

Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates

1 code implementation26 Oct 2023 Nicholas E. Corrado, Josiah P. Hanna

Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency.

Data Augmentation reinforcement-learning +1

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

no code implementations2 Jun 2023 Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.

Attribute reinforcement-learning +1

Safe Evaluation For Offline Learning: Are We Ready To Deploy?

no code implementations16 Dec 2022 Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor

In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping.

Off-policy evaluation

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

no code implementations14 Dec 2022 Brahma S. Pavse, Josiah P. Hanna

We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$.

Off-policy evaluation

A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

1 code implementation20 Sep 2022 Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna

The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w. r. t the baseline policy during training, and \textbf{(b)} eventually surpassing the baseline performance.

reinforcement-learning Reinforcement Learning (RL)

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

1 code implementation12 Jul 2022 Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht

Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training.

Disentanglement reinforcement-learning +1

Multi-agent Databases via Independent Learning

no code implementations28 May 2022 Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".

Multi-agent Reinforcement Learning Scheduling

ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

no code implementations9 Mar 2022 Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak

This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs).

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

1 code implementation29 Nov 2021 Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna

Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy.

Offline RL reinforcement-learning +1

Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration

1 code implementation ICML Workshop URL 2021 Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht

Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters.

reinforcement-learning Reinforcement Learning (RL)

Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction

1 code implementation18 Jul 2020 Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht

Authentication and key agreement are decided based on the agents' observed behaviors during the interaction.

Learning an Interpretable Traffic Signal Control Policy

1 code implementation23 Dec 2019 James Ault, Josiah P. Hanna, Guni Sharon

Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human.

Q-Learning

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

1 code implementation4 Jun 2018 Josiah P. Hanna, Scott Niekum, Peter Stone

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Off-policy evaluation

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation ICML 2017 Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

no code implementations20 Jun 2016 Josiah P. Hanna, Peter Stone, Scott Niekum

In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.

Off-policy evaluation

Cannot find the paper you are looking for? You can Submit a new open access paper.