Search Results for author: Josiah P. Hanna

Found 19 papers, 10 papers with code

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

no code implementations • 14 Nov 2023 • Nicholas E. Corrado, Josiah P. Hanna

We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

State-Action Similarity-Based Representations for Off-Policy Evaluation

1 code implementation • NeurIPS 2023 • Brahma S. Pavse, Josiah P. Hanna

Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE.

Off-policy evaluation Representation Learning

Paper
Code

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

no code implementations • 27 Oct 2023 • Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna

In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data.

Autonomous Driving D4RL +5

Paper
Add Code

Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates

1 code implementation • 26 Oct 2023 • Nicholas E. Corrado, Josiah P. Hanna

Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency.

Data Augmentation reinforcement-learning +1

Paper
Code

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

no code implementations • 2 Jun 2023 • Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.

Attribute reinforcement-learning +1

Paper
Add Code

Safe Evaluation For Offline Learning: Are We Ready To Deploy?

no code implementations • 16 Dec 2022 • Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor

In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping.

Off-policy evaluation

Paper
Add Code

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

no code implementations • 14 Dec 2022 • Brahma S. Pavse, Josiah P. Hanna

We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$.

Off-policy evaluation

Paper
Add Code

A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

1 code implementation • 20 Sep 2022 • Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna

The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w. r. t the baseline policy during training, and \textbf{(b)} eventually surpassing the baseline performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

1 code implementation • 12 Jul 2022 • Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht

Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training.

Disentanglement reinforcement-learning +1

Paper
Code

Multi-agent Databases via Independent Learning

no code implementations • 28 May 2022 • Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".

Multi-agent Reinforcement Learning Scheduling

Paper
Add Code

ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

no code implementations • 9 Mar 2022 • Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak

This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs).

Paper
Add Code

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

1 code implementation • 29 Nov 2021 • Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna

Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy.

Offline RL reinforcement-learning +1

Paper
Code

Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration

1 code implementation • ICML Workshop URL 2021 • Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht

Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction

1 code implementation • 18 Jul 2020 • Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht

Authentication and key agreement are decided based on the agents' observed behaviors during the interaction.

Paper
Code

Learning an Interpretable Traffic Signal Control Policy

1 code implementation • 23 Dec 2019 • James Ault, Josiah P. Hanna, Guni Sharon

Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human.

Q-Learning

Paper
Code

RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration

no code implementations • 18 Jun 2019 • Brahma S. Pavse, Faraz Torabi, Josiah P. Hanna, Garrett Warnell, Peter Stone

Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

1 code implementation • 4 Jun 2018 • Josiah P. Hanna, Scott Niekum, Peter Stone

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Off-policy evaluation

Paper
Code

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation • ICML 2017 • Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Paper
Code

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

no code implementations • 20 Jun 2016 • Josiah P. Hanna, Peter Stone, Scott Niekum

In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.

Off-policy evaluation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.