Search Results for author: Alessandro Lazaric

Found 85 papers, 14 papers with code

Simple Ingredients for Offline Reinforcement Learning

no code implementations19 Mar 2024 Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.

D4RL reinforcement-learning

Reinforcement Learning with Options and State Representation

no code implementations16 Mar 2024 Ayoub Ghriss, Masashi Sugiyama, Alessandro Lazaric

The current thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones to tackle the problem of learning in high-dimensional and complex environments.

Decision Making Hierarchical Reinforcement Learning +1

Layered State Discovery for Incremental Autonomous Exploration

no code implementations7 Feb 2023 Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta

We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states.

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

1 code implementation5 Jan 2023 Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.

Continuous Control Self-Supervised Learning

On the Complexity of Representation Learning in Contextual Linear Bandits

no code implementations19 Dec 2022 Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs.

Model Selection Multi-Armed Bandits +1

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

no code implementations4 Nov 2022 Yifang Chen, Karthik Sankararaman, Alessandro Lazaric, Matteo Pirotta, Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang

We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.

Active Learning

Contextual bandits with concave rewards, and an application to fair ranking

no code implementations18 Oct 2022 Virginie Do, Elvis Dohmatob, Matteo Pirotta, Alessandro Lazaric, Nicolas Usunier

We consider Contextual Bandits with Concave Rewards (CBCR), a multi-objective bandit problem where the desired trade-off between the rewards is defined by a known concave objective function, and the reward vector depends on an observed stochastic context.

Fairness Multi-Armed Bandits

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

no code implementations10 Oct 2022 Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i. e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general.

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

no code implementations4 Oct 2022 Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.

Policy Gradient Methods

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

no code implementations21 Mar 2022 Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.

Continuous Control Contrastive Learning +1

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

no code implementations13 Dec 2021 Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds.

Differentially Private Exploration in Reinforcement Learning with Linear Representation

no code implementations2 Dec 2021 Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo Pirotta

We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a. k. a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.

Privacy Preserving reinforcement-learning +1

Adaptive Multi-Goal Exploration

no code implementations23 Nov 2021 Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

A general sample complexity analysis of vanilla policy gradient

no code implementations23 Jul 2021 Rui Yuan, Robert M. Gower, Alessandro Lazaric

We then instantiate our theorems in different settings, where we both recover existing results and obtain improved sample complexity, e. g., $\widetilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity for the convergence to the global optimum for Fisher-non-degenerated parametrized policies.

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

no code implementations24 Jun 2021 Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs).

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

no code implementations NeurIPS 2021 Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

Leveraging Good Representations in Linear Contextual Bandits

no code implementations8 Apr 2021 Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

Multi-Armed Bandits

Reinforcement Learning with Prototypical Representations

1 code implementation22 Feb 2021 Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

Unfortunately, in RL, representation learning is confounded with the exploratory experience of the agent -- learning a useful representation requires diverse data, while effective exploration is only possible with coherent representations.

Continuous Control reinforcement-learning +3

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

no code implementations NeurIPS 2020 Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure.

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

no code implementations NeurIPS 2020 Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks.

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

no code implementations NeurIPS 2021 Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.

reinforcement-learning Reinforcement Learning (RL)

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

no code implementations10 Jul 2020 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We consider the problem of exploration-exploitation in communicating Markov Decision Processes.

A Novel Confidence-Based Algorithm for Structured Bandits

no code implementations23 May 2020 Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms.

Meta-learning with Stochastic Linear Bandits

no code implementations ICML 2020 Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution.

Meta-Learning

Active Model Estimation in Markov Decision Processes

no code implementations6 Mar 2020 Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric

Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.

Common Sense Reasoning Efficient Exploration

Learning Near Optimal Policies with Low Inherent Bellman Error

no code implementations ICML 2020 Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting.

reinforcement-learning Reinforcement Learning (RL)

Conservative Exploration in Reinforcement Learning

no code implementations8 Feb 2020 Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward.

reinforcement-learning Reinforcement Learning (RL)

Improved Algorithms for Conservative Exploration in Bandits

no code implementations8 Feb 2020 Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

In this case, it is desirable to deploy online learning algorithms (e. g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself.

Marketing Recommendation Systems

Concentration Inequalities for Multinoulli Random Variables

no code implementations30 Jan 2020 Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We investigate concentration inequalities for Dirichlet and Multinomial random variables.

No-Regret Exploration in Goal-Oriented Reinforcement Learning

no code implementations ICML 2020 Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

Atari Games reinforcement-learning +1

Limiting Extrapolation in Linear Approximate Value Iteration

no code implementations NeurIPS 2019 Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points.

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

1 code implementation NeurIPS 2019 Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations NeurIPS 2019 Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning Reinforcement Learning (RL)

A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

1 code implementation NeurIPS 2019 Nicolas Carion, Gabriel Synnaeve, Alessandro Lazaric, Nicolas Usunier

While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training.

Multi-agent Reinforcement Learning reinforcement-learning +4

Word-order biases in deep-agent emergent communication

1 code implementation ACL 2019 Rahma Chaabouni, Eugene Kharitonov, Alessandro Lazaric, Emmanuel Dupoux, Marco Baroni

We train models to communicate about paths in a simple gridworld, using miniature languages that reflect or violate various natural language trends, such as the tendency to avoid redundancy or to minimize long-distance dependencies.

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

1 code implementation13 Mar 2019 Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$.

Gaussian Processes

Active Exploration in Markov Decision Processes

no code implementations28 Feb 2019 Jean Tarbouriech, Alessandro Lazaric

As the noise level is initially unknown, we need to trade off the exploration of the environment to estimate the noise and the exploitation of these estimates to compute a policy maximizing the accuracy of the mean predictions.

Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes

no code implementations11 Dec 2018 Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We introduce and analyse two algorithms for exploration-exploitation in discrete and continuous Markov Decision Processes (MDPs) based on exploration bonuses.

Efficient Exploration

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

1 code implementation NeurIPS 2018 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e. g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).

Efficient Exploration

Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems

no code implementations ICML 2018 Marc Abeille, Alessandro Lazaric

Thompson sampling (TS) is an effective approach to trade off exploration and exploration in reinforcement learning.

Thompson Sampling

Improved large-scale graph learning through ridge spectral sparsification

no code implementations ICML 2018 Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko

By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e. g., SSL).

Graph Learning

Distributed Adaptive Sampling for Kernel Matrix Approximation

no code implementations27 Mar 2018 Daniele Calandriello, Alessandro Lazaric, Michal Valko

In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset.

Clustering

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

1 code implementation ICML 2018 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.

Efficient Exploration reinforcement-learning +1

Regret Minimization in MDPs with Options without Prior Knowledge

no code implementations NeurIPS 2017 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

no code implementations NeurIPS 2017 Daniele Calandriello, Alessandro Lazaric, Michal Valko

The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$.

Second-order methods

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

no code implementations ICML 2017 Daniele Calandriello, Alessandro Lazaric, Michal Valko

First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret.

Second-order methods

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

no code implementations7 May 2017 Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods.

reinforcement-learning Reinforcement Learning (RL)

Thompson Sampling for Linear-Quadratic Control Problems

no code implementations27 Mar 2017 Marc Abeille, Alessandro Lazaric

Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy.

Thompson Sampling

Exploration--Exploitation in MDPs with Options

no code implementations25 Mar 2017 Ronan Fruit, Alessandro Lazaric

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited.

Reinforcement Learning (RL)

Active Learning for Accurate Estimation of Linear Models

no code implementations ICML 2017 Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric

We explore the sequential decision making problem where the goal is to estimate uniformly well a number of linear models, given a shared budget of random contexts independently sampled from a known distribution.

Active Learning Decision Making

Linear Thompson Sampling Revisited

no code implementations20 Nov 2016 Marc Abeille, Alessandro Lazaric

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting.

Thompson Sampling

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

no code implementations13 Sep 2016 Daniele Calandriello, Alessandro Lazaric, Michal Valko

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability.

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

no code implementations17 Aug 2016 Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

Generally in RL, one can assume a generative model, e. g. graphical models, for the environment, and then the task for the RL agent is to learn the model parameters and find the optimal strategy based on these learnt parameters.

Decision Making Reinforcement Learning (RL)

Reinforcement Learning of POMDPs using Spectral Methods

no code implementations25 Feb 2016 Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods.

reinforcement-learning Reinforcement Learning (RL)

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

no code implementations21 Jan 2016 Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples.

Quantization

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

no code implementations16 Jul 2015 Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.

Active Learning Multi-Armed Bandits

Exploiting easy data in online optimization

no code implementations NeurIPS 2014 Amir Sani, Gergely Neu, Alessandro Lazaric

We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment.

Sparse Multi-Task Reinforcement Learning

no code implementations NeurIPS 2014 Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

This is equivalent to assuming that the weight vectors of the task value functions are \textit{jointly sparse}, i. e., the set of their non-zero components is small and it is shared across tasks.

reinforcement-learning Reinforcement Learning (RL)

Best-Arm Identification in Linear Bandits

no code implementations NeurIPS 2014 Marta Soare, Alessandro Lazaric, Rémi Munos

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward.

Experimental Design

Online Stochastic Optimization under Correlated Bandit Feedback

no code implementations4 Feb 2014 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.

Stochastic Optimization

Regret Bounds for Reinforcement Learning with Policy Advice

no code implementations5 May 2013 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.

reinforcement-learning Reinforcement Learning (RL)

Risk-Aversion in Multi-armed Bandits

no code implementations NeurIPS 2012 Amir Sani, Alessandro Lazaric, Rémi Munos

In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward.

Multi-Armed Bandits

Multi-Bandit Best Arm Identification

no code implementations NeurIPS 2011 Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i. e., small gap).

Transfer from Multiple MDPs

no code implementations NeurIPS 2011 Alessandro Lazaric, Marcello Restelli

Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms.

reinforcement-learning Reinforcement Learning (RL) +1

LSTD with Random Projections

no code implementations NeurIPS 2010 Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos

We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.