Search Results for author: Andrea Tirinzoni

Found 23 papers, 4 papers with code

Simple Ingredients for Offline Reinforcement Learning

no code implementations19 Mar 2024 Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.

D4RL reinforcement-learning

Towards Instance-Optimality in Online PAC Reinforcement Learning

no code implementations31 Oct 2023 Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

In this paper, we propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy in any tabular episodic MDP.

reinforcement-learning

Active Coverage for PAC Reinforcement Learning

no code implementations23 Jun 2023 Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one.

reinforcement-learning Reinforcement Learning (RL)

Layered State Discovery for Incremental Autonomous Exploration

no code implementations7 Feb 2023 Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta

We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states.

On the Complexity of Representation Learning in Contextual Linear Bandits

no code implementations19 Dec 2022 Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs.

Model Selection Multi-Armed Bandits +1

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

no code implementations10 Oct 2022 Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i. e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general.

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

no code implementations12 Jul 2022 Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view.

reinforcement-learning Reinforcement Learning (RL)

On Elimination Strategies for Bandit Fixed-Confidence Identification

1 code implementation22 May 2022 Andrea Tirinzoni, Rémy Degenne

Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time.

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

no code implementations17 Mar 2022 Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$.

reinforcement-learning Reinforcement Learning (RL)

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

1 code implementation NeurIPS 2021 Clémence Réda, Andrea Tirinzoni, Rémy Degenne

In this work, we first derive a tractable lower bound on the sample complexity of any $\delta$-correct algorithm for the general Top-m identification problem.

Recommendation Systems

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

no code implementations24 Jun 2021 Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs).

Meta-Reinforcement Learning by Tracking Task Non-stationarity

1 code implementation18 May 2021 Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli

At test time, TRIO tracks the evolution of the latent parameters online, hence reducing the uncertainty over future tasks and obtaining fast adaptation through the meta-learned policy.

Meta Reinforcement Learning reinforcement-learning +1

Leveraging Good Representations in Linear Contextual Bandits

no code implementations8 Apr 2021 Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

Multi-Armed Bandits

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

no code implementations NeurIPS 2020 Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure.

Sequential Transfer in Reinforcement Learning with a Generative Model

no code implementations ICML 2020 Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.

reinforcement-learning Reinforcement Learning (RL)

A Novel Confidence-Based Algorithm for Structured Bandits

no code implementations23 May 2020 Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms.

Gradient-Aware Model-based Policy Search

no code implementations9 Sep 2019 Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-based Reinforcement Learning

Feature Selection via Mutual Information: New Theoretical Insights

1 code implementation17 Jul 2019 Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables.

feature selection regression

Importance Weighted Transfer of Samples in Reinforcement Learning

no code implementations ICML 2018 Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli

In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.