no code implementations • 19 Mar 2024 • Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati
Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
no code implementations • 31 Oct 2023 • Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
In this paper, we propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy in any tabular episodic MDP.
no code implementations • 23 Jun 2023 • Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one.
no code implementations • 7 Feb 2023 • Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta
We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states.
no code implementations • 19 Dec 2022 • Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric
In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs.
no code implementations • 24 Oct 2022 • Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta
We study the problem of representation learning in stochastic contextual linear bandits.
no code implementations • 10 Oct 2022 • Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric
We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i. e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general.
no code implementations • 12 Jul 2022 • Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann
Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view.
1 code implementation • 22 May 2022 • Andrea Tirinzoni, Rémy Degenne
Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time.
no code implementations • 17 Mar 2022 • Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann
In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$.
1 code implementation • NeurIPS 2021 • Clémence Réda, Andrea Tirinzoni, Rémy Degenne
In this work, we first derive a tractable lower bound on the sample complexity of any $\delta$-correct algorithm for the general Top-m identification problem.
no code implementations • NeurIPS 2021 • Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
no code implementations • 24 Jun 2021 • Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric
We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs).
1 code implementation • 18 May 2021 • Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli
At test time, TRIO tracks the evolution of the latent parameters online, hence reducing the uncertainty over future tasks and obtaining fast adaptation through the meta-learned policy.
no code implementations • 8 Apr 2021 • Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta
We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).
no code implementations • NeurIPS 2020 • Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric
Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure.
no code implementations • ICML 2020 • Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli
We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
no code implementations • 23 May 2020 • Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli
We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms.
no code implementations • 9 Sep 2019 • Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli
In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.
1 code implementation • 17 Jul 2019 • Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli
Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables.
no code implementations • NeurIPS 2018 • Andrea Tirinzoni, Rafael Rodriguez Sanchez, Marcello Restelli
We consider the problem of transferring value functions in reinforcement learning.
no code implementations • NeurIPS 2018 • Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart
What policy should be employed in a Markov decision process with uncertain parameters?
no code implementations • ICML 2018 • Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli
In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight.