Search Results for author: Andrea Zanette

Found 17 papers, 2 papers with code

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

1 code implementation • 29 Feb 2024 • Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

In this paper, we develop a framework for building multi-turn RL algorithms for fine-tuning LLMs, that preserves the flexibility of existing single-turn RL methods for LLMs (e. g., proximal policy optimization), while accommodating multiple turns, long horizons, and delayed rewards effectively.

Language Modelling Reinforcement Learning (RL)

Paper
Code

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

no code implementations • 24 Feb 2024 • Ruiqi Zhang, Yuexiang Zhai, Andrea Zanette

Surprisingly, in this work, we demonstrate that even in such a data-starved setting it may still be possible to find a policy competitive with the optimal one.

Decision Making Multi-Armed Bandits

Paper
Add Code

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

no code implementations • 10 Nov 2022 • Andrea Zanette

Model-free algorithms for reinforcement learning typically require a condition called Bellman completeness in order to successfully operate off-policy with function approximation, unless additional conditions are met.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

no code implementations • 1 Jun 2022 • Andrea Zanette, Martin J. Wainwright

Such instability can be observed even with linear function approximation.

Q-Learning

Paper
Add Code

Bellman Residual Orthogonalization for Offline Reinforcement Learning

no code implementations • 24 Mar 2022 • Andrea Zanette, Martin J. Wainwright

We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions.

Offline RL Off-policy evaluation +1

Paper
Add Code

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

no code implementations • NeurIPS 2021 • Andrea Zanette, Martin J. Wainwright, Emma Brunskill

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Design of Experiments for Stochastic Contextual Linear Bandits

no code implementations • NeurIPS 2021 • Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired.

Paper
Add Code

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

no code implementations • 24 Mar 2021 • Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

no code implementations • 14 Dec 2020 • Andrea Zanette

Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

no code implementations • NeurIPS 2020 • Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks.

Paper
Add Code

Learning Near Optimal Policies with Low Inherent Bellman Error

no code implementations • ICML 2020 • Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Limiting Extrapolation in Linear Approximate Value Iteration

no code implementations • NeurIPS 2019 • Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points.

Paper
Add Code

Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model

no code implementations • NeurIPS 2019 • Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill

This paper focuses on the problem of computing an $\epsilon$-optimal policy in a discounted Markov Decision Process (MDP) provided that we can access the reward and transition function through a generative model.

Paper
Add Code

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

no code implementations • ICML 2018 • Andrea Zanette, Emma Brunskill

In order to make good decision under uncertainty an agent must learn from observations.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

2 code implementations • 1 Nov 2019 • Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL).

Reinforcement Learning (RL)

Paper
Code

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

no code implementations • 1 Jan 2019 • Andrea Zanette, Emma Brunskill

Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict.

Learning Theory Reinforcement Learning (RL)

Paper
Add Code

Robust Super-Level Set Estimation using Gaussian Processes

no code implementations • 25 Nov 2018 • Andrea Zanette, Junzi Zhang, Mykel J. Kochenderfer

This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability.

Gaussian Processes

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.