no code implementations • 26 Jul 2024 • Andrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, Adam White
This paper introduces a new empirical methodology, the Cross-environment Hyperparameter Setting Benchmark, that compares RL algorithms across environments using a single hyperparameter setting, encouraging algorithmic development which is insensitive to hyperparameters.
no code implementations • 12 Jul 2024 • Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White
One exception is Prioritized Experience Replay (PER), where sampling is done proportionally to TD errors, inspired by the success of prioritized sweeping in dynamic programming.
no code implementations • 4 Dec 2023 • Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White
As a result, no OPS method can be more sample efficient than OPE in the worst case.
no code implementations • 3 Apr 2023 • Andrew Patterson, Samuel Neumann, Martha White, Adam White
The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.
no code implementations • 17 May 2022 • Andrew Patterson, Victor Liao, Martha White
We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
1 code implementation • 4 Feb 2022 • Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood
The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient.
no code implementations • 28 Apr 2021 • Andrew Patterson, Adam White, Martha White
Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation.
no code implementations • 8 Sep 2020 • Aditya Gahlawat, Arun Lakshmanan, Lin Song, Andrew Patterson, Zhuohuan Wu, Naira Hovakimyan, Evangelos Theodorou
We present $\mathcal{CL}_1$-$\mathcal{GP}$, a control framework that enables safe simultaneous learning and control for systems subject to uncertainties.
1 code implementation • ICML 2020 • Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White
It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.
no code implementations • L4DC 2020 • Aditya Gahlawat, Pan Zhao, Andrew Patterson, Naira Hovakimyan, Evangelos Theodorou
We present L1-GP, an architecture based on L1 adaptive control and Gaussian Process Regression (GPR) for safe simultaneous control and learning.
no code implementations • 5 Feb 2020 • Andrew Patterson, Aditya Gahlawat, Naira Hovakimyan
The safety of these agents is dependent on their ability to predict collisions with other vehicles' future trajectories for replanning and collision avoidance.
1 code implementation • NeurIPS 2019 • Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar F. Caiafa, Russell Greiner, Martha White
We develop an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem.
no code implementations • 4 Apr 2019 • Andrew Patterson, Arun Lakshmanan, Naira Hovakimyan
We show that the uncertainty region for obstacle positions can be expressed in terms of a combination of polynomials generated with Gaussian process regression.
3 code implementations • 13 Feb 2019 • Arun Lakshmanan, Andrew Patterson, Venanzio Cichella, Naira Hovakimyan
In motion planning problems for autonomous robots, such as self-driving cars, the robot must ensure that its planned path is not in close proximity to obstacles in the environment.
Robotics Computational Geometry Graphics
1 code implementation • NeurIPS 2018 • Lei Le, Andrew Patterson, Martha White
A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters.
no code implementations • 6 Nov 2018 • Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White
The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.
no code implementations • 18 Jul 2018 • Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White
A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.
no code implementations • 12 Jun 2018 • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.
no code implementations • ICLR 2018 • Matthew Schlegel, Andrew Patterson, Adam White, Martha White
We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.