Search Results for author: Wesley A. Suttle

Found 7 papers, 2 papers with code

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

no code implementations • 20 Apr 2024 • Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers.

Hierarchical Reinforcement Learning reinforcement-learning

Paper
Add Code

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic

no code implementations • 18 Mar 2024 • Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods.

Policy Gradient Methods

Paper
Add Code

Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

1 code implementation • 6 Mar 2024 • Wesley A. Suttle, Vipul K. Sharma, Krishna C. Kosaraju, S. Sivaranjani, Ji Liu, Vijay Gupta, Brian M. Sadler

We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

1 code implementation • 9 Feb 2024 • Michael Y. Fatemi, Wesley A. Suttle, Brian M. Sadler

Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

no code implementations • 9 Jun 2023 • Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

no code implementations • 28 Jan 2023 • Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection.

Reinforcement Learning (RL)

Paper
Add Code

Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search

no code implementations • 21 Jan 2022 • Wesley A. Suttle, Alec Koppel, Ji Liu

In this work, we propose an information-directed objective for infinite-horizon reinforcement learning (RL), called the occupancy information ratio (OIR), inspired by the information ratio objectives used in previous information-directed sampling schemes for multi-armed bandits and Markov decision processes as well as recent advances in general utility RL.

Multi-Armed Bandits Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.