Search Results for author: Alessandro Lazaric

Found 85 papers, 14 papers with code

Simple Ingredients for Offline Reinforcement Learning

no code implementations • 19 Mar 2024 • Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.

D4RL reinforcement-learning

Paper
Add Code

Reinforcement Learning with Options and State Representation

no code implementations • 16 Mar 2024 • Ayoub Ghriss, Masashi Sugiyama, Alessandro Lazaric

The current thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones to tackle the problem of learning in high-dimensional and complex environments.

Decision Making Hierarchical Reinforcement Learning +1

Paper
Add Code

Layered State Discovery for Incremental Autonomous Exploration

no code implementations • 7 Feb 2023 • Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta

We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states.

Paper
Add Code

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

1 code implementation • 5 Jan 2023 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.

Continuous Control Self-Supervised Learning

Paper
Code

On the Complexity of Representation Learning in Contextual Linear Bandits

no code implementations • 19 Dec 2022 • Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs.

Model Selection Multi-Armed Bandits +1

Paper
Add Code

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

no code implementations • 4 Nov 2022 • Yifang Chen, Karthik Sankararaman, Alessandro Lazaric, Matteo Pirotta, Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang

We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.

Active Learning

Paper
Add Code

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

no code implementations • 24 Oct 2022 • Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

We study the problem of representation learning in stochastic contextual linear bandits.

Multi-Armed Bandits Representation Learning

Paper
Add Code

Contextual bandits with concave rewards, and an application to fair ranking

no code implementations • 18 Oct 2022 • Virginie Do, Elvis Dohmatob, Matteo Pirotta, Alessandro Lazaric, Nicolas Usunier

We consider Contextual Bandits with Concave Rewards (CBCR), a multi-objective bandit problem where the desired trade-off between the rewards is defined by a known concave objective function, and the reward vector depends on an observed stochastic context.

Fairness Multi-Armed Bandits

Paper
Add Code

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

no code implementations • 10 Oct 2022 • Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i. e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general.

Paper
Add Code

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

no code implementations • 4 Oct 2022 • Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.

Policy Gradient Methods

Paper
Add Code

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

no code implementations • 21 Mar 2022 • Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping.

Continuous Control Contrastive Learning +1

Paper
Add Code

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

1 code implementation • 31 Jan 2022 • Denis Yarats, David Brandfonbrener, Hao liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto

In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.

Offline RL reinforcement-learning +1

Paper
Code

Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

no code implementations • 30 Jan 2022 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points.

Active Learning Hyperparameter Optimization

Paper
Add Code

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

no code implementations • 13 Dec 2021 • Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds.

Paper
Add Code

Differentially Private Exploration in Reinforcement Learning with Linear Representation

no code implementations • 2 Dec 2021 • Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo Pirotta

We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a. k. a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.

Privacy Preserving reinforcement-learning +1

Paper
Add Code

Adaptive Multi-Goal Exploration

no code implementations • 23 Nov 2021 • Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

Paper
Add Code

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

1 code implementation • ICML Workshop URL 2021 • Pierre-Alexandre Kamienny, Jean Tarbouriech, Sylvain Lamprier, Alessandro Lazaric, Ludovic Denoyer

Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning.

426

Paper
Code

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

no code implementations • NeurIPS 2021 • Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A general sample complexity analysis of vanilla policy gradient

no code implementations • 23 Jul 2021 • Rui Yuan, Robert M. Gower, Alessandro Lazaric

We then instantiate our theorems in different settings, where we both recover existing results and obtain improved sample complexity, e. g., $\widetilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity for the convergence to the global optimum for Fisher-non-degenerated parametrized policies.

Paper
Add Code

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

7 code implementations • ICLR 2022 • Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control.

Ranked #5 on Unsupervised Reinforcement Learning on URLB (states, 2*10^6 frames)

Continuous Control Data Augmentation +3

398

Paper
Code

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

no code implementations • 24 Jun 2021 • Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs).

Paper
Add Code

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

no code implementations • ICLR 2022 • Yunchang Yang, Tianhao Wu, Han Zhong, Evrard Garcelon, Matteo Pirotta, Alessandro Lazaric, LiWei Wang, Simon S. Du

We also obtain a new upper bound for conservative low-rank MDP.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Exploration-Driven Representation Learning in Reinforcement Learning

no code implementations • ICML Workshop URL 2021 • Akram Erraqabi, Mingde Zhao, Marlos C. Machado, Yoshua Bengio, Sainbayar Sukhbaatar, Ludovic Denoyer, Alessandro Lazaric

In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

no code implementations • NeurIPS 2021 • Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

Paper
Add Code

Leveraging Good Representations in Linear Contextual Bandits

no code implementations • 8 Apr 2021 • Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

Multi-Armed Bandits

Paper
Add Code

Reinforcement Learning with Prototypical Representations

1 code implementation • 22 Feb 2021 • Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

Unfortunately, in RL, representation learning is confounded with the exploratory experience of the agent -- learning a useful representation requires diverse data, while effective exploration is only possible with coherent representations.

Ranked #1 on Unsupervised Reinforcement Learning on URLB (pixels, 10^5 frames)

Continuous Control reinforcement-learning +3

Paper
Code

Meta-Reinforcement Learning With Informed Policy Regularization

no code implementations • 1 Jan 2021 • Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, Thibault Lavril, Nicolas Usunier, Ludovic Denoyer

Meta-reinforcement learning aims at finding a policy able to generalize to new environments.

Meta Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

no code implementations • NeurIPS 2020 • Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We investigate the exploration of an unknown environment when no reward function is provided.

Paper
Add Code

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

no code implementations • NeurIPS 2020 • Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure.

Paper
Add Code

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

no code implementations • NeurIPS 2020 • Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks.

Paper
Add Code

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

no code implementations • NeurIPS 2021 • Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

no code implementations • ICML 2020 • Marc Abeille, Alessandro Lazaric

We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.

Paper
Add Code

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

no code implementations • 10 Jul 2020 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We consider the problem of exploration-exploitation in communicating Markov Decision Processes.

Paper
Add Code

A Novel Confidence-Based Algorithm for Structured Bandits

no code implementations • 23 May 2020 • Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms.

Paper
Add Code

Meta-learning with Stochastic Linear Bandits

no code implementations • ICML 2020 • Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution.

Meta-Learning

Paper
Add Code

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

1 code implementation • 6 May 2020 • Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, Thibault Lavril, Nicolas Usunier, Ludovic Denoyer

We test the performance of our algorithm in a variety of environments where tasks may vary within each episode.

Paper
Code

Active Model Estimation in Markov Decision Processes

no code implementations • 6 Mar 2020 • Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric

Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.

Common Sense Reasoning Efficient Exploration

Paper
Add Code

Learning Near Optimal Policies with Low Inherent Bellman Error

no code implementations • ICML 2020 • Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification

1 code implementation • ICML 2020 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty.

Gaussian Processes

Paper
Code

Adversarial Attacks on Linear Contextual Bandits

no code implementations • NeurIPS 2020 • Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, Matteo Pirotta

In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior.

Multi-Armed Bandits Recommendation Systems

Paper
Add Code

Conservative Exploration in Reinforcement Learning

no code implementations • 8 Feb 2020 • Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Improved Algorithms for Conservative Exploration in Bandits

no code implementations • 8 Feb 2020 • Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

In this case, it is desirable to deploy online learning algorithms (e. g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself.

Marketing Recommendation Systems

Paper
Add Code

Concentration Inequalities for Multinoulli Random Variables

no code implementations • 30 Jan 2020 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We investigate concentration inequalities for Dirichlet and Multinomial random variables.

Paper
Add Code

No-Regret Exploration in Goal-Oriented Reinforcement Learning

no code implementations • ICML 2020 • Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

Atari Games reinforcement-learning +1

Paper
Add Code

Limiting Extrapolation in Linear Approximate Value Iteration

no code implementations • NeurIPS 2019 • Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points.

Paper
Add Code

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

1 code implementation • NeurIPS 2019 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).

Paper
Code

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations • NeurIPS 2019 • Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

2 code implementations • 1 Nov 2019 • Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL).

Reinforcement Learning (RL)

Paper
Code

A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

1 code implementation • NeurIPS 2019 • Nicolas Carion, Gabriel Synnaeve, Alessandro Lazaric, Nicolas Usunier

While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training.

Multi-agent Reinforcement Learning reinforcement-learning +4

644

Paper
Code

Word-order biases in deep-agent emergent communication

1 code implementation • ACL 2019 • Rahma Chaabouni, Eugene Kharitonov, Alessandro Lazaric, Emmanuel Dupoux, Marco Baroni

We train models to communicate about paths in a simple gridworld, using miniature languages that reflect or violate various natural language trends, such as the tendency to avoid redundancy or to minimize long-distance dependencies.

Paper
Code

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

1 code implementation • 13 Mar 2019 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$.

Gaussian Processes

Paper
Code

Active Exploration in Markov Decision Processes

no code implementations • 28 Feb 2019 • Jean Tarbouriech, Alessandro Lazaric

As the noise level is initially unknown, we need to trade off the exploration of the environment to estimate the noise and the exploitation of these estimates to compute a policy maximizing the accuracy of the mean predictions.

Paper
Add Code

Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes

no code implementations • 11 Dec 2018 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We introduce and analyse two algorithms for exploration-exploitation in discrete and continuous Markov Decision Processes (MDPs) based on exploration bonuses.

Efficient Exploration

Paper
Add Code

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

no code implementations • NeurIPS 2018 • Romain Warlop, Alessandro Lazaric, Jérémie Mary

A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy.

Recommendation Systems reinforcement-learning +1

Paper
Add Code

Rotting bandits are not harder than stochastic ones

no code implementations • 27 Nov 2018 • Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary.

Multi-Armed Bandits Recommendation Systems

Paper
Add Code

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

1 code implementation • NeurIPS 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e. g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).

Efficient Exploration

Paper
Code

Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems

no code implementations • ICML 2018 • Marc Abeille, Alessandro Lazaric

Thompson sampling (TS) is an effective approach to trade off exploration and exploration in reinforcement learning.

Thompson Sampling

Paper
Add Code

Improved large-scale graph learning through ridge spectral sparsification

no code implementations • ICML 2018 • Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko

By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e. g., SSL).

Graph Learning

Paper
Add Code

Distributed Adaptive Sampling for Kernel Matrix Approximation

no code implementations • 27 Mar 2018 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset.

Clustering

Paper
Add Code

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

1 code implementation • ICML 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.

Efficient Exploration reinforcement-learning +1

Paper
Code

Regret Minimization in MDPs with Options without Prior Knowledge

no code implementations • NeurIPS 2017 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).

Paper
Add Code

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

no code implementations • NeurIPS 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$.

Second-order methods

Paper
Add Code

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

no code implementations • ICML 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret.

Second-order methods

Paper
Add Code

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

no code implementations • 7 May 2017 • Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Thompson Sampling for Linear-Quadratic Control Problems

no code implementations • 27 Mar 2017 • Marc Abeille, Alessandro Lazaric

Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy.

Thompson Sampling

Paper
Add Code

Exploration--Exploitation in MDPs with Options

no code implementations • 25 Mar 2017 • Ronan Fruit, Alessandro Lazaric

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited.

Reinforcement Learning (RL)

Paper
Add Code

Active Learning for Accurate Estimation of Linear Models

no code implementations • ICML 2017 • Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric

We explore the sequential decision making problem where the goal is to estimate uniformly well a number of linear models, given a shared budget of random contexts independently sampled from a known distribution.

Active Learning Decision Making

Paper
Add Code

Linear Thompson Sampling Revisited

no code implementations • 20 Nov 2016 • Marc Abeille, Alessandro Lazaric

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting.

Thompson Sampling

Paper
Add Code

Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

no code implementations • 11 Nov 2016 • Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

We derive finite-time regret bounds for our algorithm with a weak dependence on the dimensionality of the observed space.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

no code implementations • 13 Sep 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability.

Paper
Add Code

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

no code implementations • 17 Aug 2016 • Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

Generally in RL, one can assume a generative model, e. g. graphical models, for the environment, and then the task for the RL agent is to learn the model parameters and find the optimal strategy based on these learnt parameters.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Reinforcement Learning of POMDPs using Spectral Methods

no code implementations • 25 Feb 2016 • Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

no code implementations • 21 Jan 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples.

Quantization

Paper
Add Code

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

no code implementations • 16 Jul 2015 • Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.

Active Learning Multi-Armed Bandits

Paper
Add Code

Exploiting easy data in online optimization

no code implementations • NeurIPS 2014 • Amir Sani, Gergely Neu, Alessandro Lazaric

We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment.

Paper
Add Code

Sparse Multi-Task Reinforcement Learning

no code implementations • NeurIPS 2014 • Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

This is equivalent to assuming that the weight vectors of the task value functions are \textit{jointly sparse}, i. e., the set of their non-zero components is small and it is shared across tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Best-Arm Identification in Linear Bandits

no code implementations • NeurIPS 2014 • Marta Soare, Alessandro Lazaric, Rémi Munos

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward.

Experimental Design

Paper
Add Code

Online Stochastic Optimization under Correlated Bandit Feedback

no code implementations • 4 Feb 2014 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.

Stochastic Optimization

Paper
Add Code

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

no code implementations • NeurIPS 2013 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Regret Bounds for Reinforcement Learning with Policy Advice

no code implementations • 5 May 2013 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Risk-Aversion in Multi-armed Bandits

no code implementations • NeurIPS 2012 • Amir Sani, Alessandro Lazaric, Rémi Munos

In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward.

Multi-Armed Bandits

Paper
Add Code

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

no code implementations • NeurIPS 2012 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting.

Paper
Add Code

Multi-Bandit Best Arm Identification

no code implementations • NeurIPS 2011 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i. e., small gap).

Paper
Add Code

Transfer from Multiple MDPs

no code implementations • NeurIPS 2011 • Alessandro Lazaric, Marcello Restelli

Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

LSTD with Random Projections

no code implementations • NeurIPS 2010 • Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos

We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.