Search Results for author: Tor Lattimore

Found 55 papers, 5 papers with code

Bandit Phase Retrieval

no code implementations3 Jun 2021 Tor Lattimore, Botao Hao

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector.

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

no code implementations1 Jun 2021 Tor Lattimore

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, \theta\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $\theta \in \mathbb R^d$ that is homogeneous over time.

Information Directed Sampling for Sparse Linear Bandits

no code implementations29 May 2021 Botao Hao, Tor Lattimore, Wei Deng

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure.

Decision Making

On the Optimality of Batch Policy Optimization Algorithms

no code implementations6 Apr 2021 Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Asymptotically Optimal Information-Directed Sampling

no code implementations11 Nov 2020 Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

Online Sparse Reinforcement Learning

no code implementations8 Nov 2020 Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

Feature Selection Model Selection

High-Dimensional Sparse Linear Bandits

no code implementations NeurIPS 2020 Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Mirror Descent and the Information Ratio

no code implementations25 Sep 2020 Tor Lattimore, András György

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].

Gaussian Gated Linear Networks

1 code implementation NeurIPS 2020 David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

Denoising Density Estimation +1

Matrix games with bandit feedback

no code implementations9 Jun 2020 Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

no code implementations31 May 2020 Tor Lattimore

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2. 5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions.

Model Selection in Contextual Stochastic Bandit Problems

no code implementations NeurIPS 2020 Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

We propose a master algorithm inspired by CORRAL \cite{DBLP:conf/colt/AgarwalLNS17} and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits us to obtain $O(\sqrt{T})$ regret guarantees for a wide class of base algorithms when working along with our master.

Model Selection

Information Directed Sampling for Linear Partial Monitoring

no code implementations25 Feb 2020 Johannes Kirschner, Tor Lattimore, Andreas Krause

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits.

Decision Making Decision Making Under Uncertainty

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

no code implementations ICML 2020 Tor Lattimore, Csaba Szepesvari, Gellert Weisz

The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)$ requires examining essentially all actions.

Adaptive Exploration in Linear Contextual Bandit

no code implementations15 Oct 2019 Botao Hao, Tor Lattimore, Csaba Szepesvari

Contextual bandits serve as a fundamental model for many sequential decision making tasks.

Decision Making Multi-Armed Bandits

Behaviour Suite for Reinforcement Learning

2 code implementations ICLR 2020 Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

Iterative Budgeted Exponential Search

no code implementations30 Jul 2019 Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.

Exploration by Optimisation in Partial Monitoring

no code implementations12 Jul 2019 Tor Lattimore, Csaba Szepesvari

We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound.

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

no code implementations7 Jun 2019 Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

no code implementations NeurIPS 2019 Julian Zimmert, Tor Lattimore

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

no code implementations19 Mar 2019 Roman Pogodin, Tor Lattimore

Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

no code implementations1 Feb 2019 Tor Lattimore, Csaba Szepesvari

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary.

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

no code implementations8 Jan 2019 Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations13 Nov 2018 Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Online Learning to Rank with Features

no code implementations5 Oct 2018 Shuai Li, Tor Lattimore, Csaba Szepesvári

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.


Linear Bandits with Stochastic Delayed Feedback

no code implementations ICML 2020 Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.

Multi-Armed Bandits

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

no code implementations15 Jun 2018 Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.

Learning-To-Rank Re-Ranking +1

TopRank: A practical algorithm for online stochastic ranking

no code implementations NeurIPS 2018 Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.

Decision Making Learning-To-Rank

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

no code implementations23 May 2018 Tor Lattimore, Csaba Szepesvari

Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner.

Classification General Classification

Online Learning with Gated Linear Networks

no code implementations5 Dec 2017 Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth

This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis

no code implementations NeurIPS 2017 Tor Lattimore

Existing strategies for finite-armed stochastic bandits mostly depend on a parameter of scale that must be known in advance.

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

1 code implementation NeurIPS 2017 Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

no code implementations NeurIPS 2016 Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

no code implementations14 Oct 2016 Tor Lattimore, Csaba Szepesvari

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications.

Free Lunch for Optimisation under the Universal Distribution

no code implementations16 Aug 2016 Tom Everitt, Tor Lattimore, Marcus Hutter

Function optimisation is a major challenge in computer science.

Causal Bandits: Learning Good Interventions via Causal Inference

no code implementations NeurIPS 2016 Finnian Lattimore, Tor Lattimore, Mark D. Reid

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment.

Causal Inference

On Explore-Then-Commit Strategies

no code implementations NeurIPS 2016 Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.

Refined Lower Bounds for Adversarial Bandits

no code implementations NeurIPS 2016 Sébastien Gerchinovitz, Tor Lattimore

First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case.

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

no code implementations29 Mar 2016 Tor Lattimore

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise.

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations25 Feb 2016 Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

Conservative Bandits

no code implementations13 Feb 2016 Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári

We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.

Linear Multi-Resource Allocation with Semi-Bandit Feedback

no code implementations NeurIPS 2015 Tor Lattimore, Koby Crammer, Csaba Szepesvari

In each time step the learner chooses an allocation of several resource types between a number of tasks.

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

no code implementations18 Nov 2015 Tor Lattimore

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon.

Multi-Armed Bandits

The Pareto Regret Frontier for Bandits

no code implementations NeurIPS 2015 Tor Lattimore

Given a multi-armed bandit problem it may be desirable to achieve a smaller-than-usual worst-case regret for some special actions.

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

no code implementations28 Jul 2015 Tor Lattimore

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret.

Bounded Regret for Finite-Armed Structured Bandits

no code implementations NeurIPS 2014 Tor Lattimore, Remi Munos

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms.

Optimal Resource Allocation with Semi-Bandit Feedback

no code implementations15 Jun 2014 Tor Lattimore, Koby Crammer, Csaba Szepesvári

We study a sequential resource allocation problem involving a fixed number of recurring jobs.

The Sample-Complexity of General Reinforcement Learning

no code implementations22 Aug 2013 Tor Lattimore, Marcus Hutter, Peter Sunehag

We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.

General Reinforcement Learning

Concentration and Confidence for Discrete Bayesian Sequence Predictors

no code implementations29 Jun 2013 Tor Lattimore, Marcus Hutter, Peter Sunehag

We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.

Cannot find the paper you are looking for? You can Submit a new open access paper.