Search Results for author: David Brandfonbrener

Found 13 papers, 9 papers with code

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

1 code implementation22 Feb 2024 Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

Code Generation Language Modelling

Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search

1 code implementation13 Feb 2024 David Brandfonbrener, Sibi Raja, Tarun Prasad, Chloe Loughridge, Jianang Yang, Simon Henniger, William E. Byrd, Robert Zinkov, Nada Amin

The base model with VMCTS is even competitive with ChatGPT4 augmented with plugins and multiple re-tries on these problems.

Repeat After Me: Transformers are Better than State Space Models at Copying

1 code implementation1 Feb 2024 Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.

Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning

no code implementations5 Oct 2022 David Brandfonbrener, Stephen Tu, Avi Singh, Stefan Welker, Chad Boodoo, Nikolai Matni, Jake Varley

We find that by adjusting the data collection process we improve the quality of both the learned value functions and policies over a variety of baseline methods for data collection.

Continuous Control Reinforcement Learning (RL)

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

no code implementations2 Jun 2022 David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

When does return-conditioned supervised learning work for offline reinforcement learning?

1 code implementation2 Jun 2022 David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).

D4RL reinforcement-learning +1

Quantile Filtered Imitation Learning

no code implementations2 Dec 2021 David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning.

D4RL Imitation Learning

Offline RL Without Off-Policy Evaluation

1 code implementation NeurIPS 2021 David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

In addition, we hypothesize that the strong performance of the one-step algorithm is due to a combination of favorable structure in the environment and behavior policy.

D4RL Offline RL +1

Evaluating representations by the complexity of learning low-loss predictors

1 code implementation15 Sep 2020 William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho

We consider the problem of evaluating representations of data for use in solving a downstream task.

Geometric Insights into the Convergence of Nonlinear TD Learning

no code implementations ICLR 2020 David Brandfonbrener, Joan Bruna

Then, we show how environments that are more reversible induce dynamics that are better for TD learning and prove global convergence to the true value function for well-conditioned function approximators.

Cannot find the paper you are looking for? You can Submit a new open access paper.