1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener
The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.
1 code implementation • 13 Feb 2024 • David Brandfonbrener, Sibi Raja, Tarun Prasad, Chloe Loughridge, Jianang Yang, Simon Henniger, William E. Byrd, Robert Zinkov, Nada Amin
The base model with VMCTS is even competitive with ChatGPT4 augmented with plugins and multiple re-tries on these problems.
1 code implementation • 1 Feb 2024 • Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach
Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.
no code implementations • 5 Oct 2022 • David Brandfonbrener, Stephen Tu, Avi Singh, Stefan Welker, Chad Boodoo, Nikolai Matni, Jake Varley
We find that by adjusting the data collection process we improve the quality of both the learned value functions and policies over a variety of baseline methods for data collection.
no code implementations • 2 Jun 2022 • David Brandfonbrener, Remi Tachet des Combes, Romain Laroche
In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces.
1 code implementation • 2 Jun 2022 • David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna
Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).
1 code implementation • 31 Jan 2022 • Denis Yarats, David Brandfonbrener, Hao liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto
In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
no code implementations • 2 Dec 2021 • David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna
We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning.
1 code implementation • NeurIPS 2021 • David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna
In addition, we hypothesize that the strong performance of the one-step algorithm is due to a combination of favorable structure in the environment and behavior policy.
1 code implementation • 15 Sep 2020 • William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho
We consider the problem of evaluating representations of data for use in solving a downstream task.
1 code implementation • 27 Jun 2020 • David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna
We show that this discrepancy is due to the \emph{action-stability} of their objectives.
2 code implementations • 1 Nov 2019 • Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL).
no code implementations • ICLR 2020 • David Brandfonbrener, Joan Bruna
Then, we show how environments that are more reversible induce dynamics that are better for TD learning and prove global convergence to the true value function for well-conditioned function approximators.