Search Results for author: Tor Lattimore

Found 67 papers, 5 papers with code

Concentration and Confidence for Discrete Bayesian Sequence Predictors

no code implementations29 Jun 2013 Tor Lattimore, Marcus Hutter, Peter Sunehag

We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.

The Sample-Complexity of General Reinforcement Learning

no code implementations22 Aug 2013 Tor Lattimore, Marcus Hutter, Peter Sunehag

We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.

General Reinforcement Learning reinforcement-learning +1

Optimal Resource Allocation with Semi-Bandit Feedback

no code implementations15 Jun 2014 Tor Lattimore, Koby Crammer, Csaba Szepesvári

We study a sequential resource allocation problem involving a fixed number of recurring jobs.

Bounded Regret for Finite-Armed Structured Bandits

no code implementations NeurIPS 2014 Tor Lattimore, Remi Munos

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms.

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

no code implementations28 Jul 2015 Tor Lattimore

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret.

The Pareto Regret Frontier for Bandits

no code implementations NeurIPS 2015 Tor Lattimore

Given a multi-armed bandit problem it may be desirable to achieve a smaller-than-usual worst-case regret for some special actions.

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

no code implementations18 Nov 2015 Tor Lattimore

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon.

Multi-Armed Bandits Thompson Sampling

Linear Multi-Resource Allocation with Semi-Bandit Feedback

no code implementations NeurIPS 2015 Tor Lattimore, Koby Crammer, Csaba Szepesvari

In each time step the learner chooses an allocation of several resource types between a number of tasks.

Conservative Bandits

no code implementations13 Feb 2016 Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári

We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations25 Feb 2016 Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

reinforcement-learning Reinforcement Learning (RL) +1

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

no code implementations29 Mar 2016 Tor Lattimore

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise.

Refined Lower Bounds for Adversarial Bandits

no code implementations NeurIPS 2016 Sébastien Gerchinovitz, Tor Lattimore

First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case.

On Explore-Then-Commit Strategies

no code implementations NeurIPS 2016 Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.

Causal Bandits: Learning Good Interventions via Causal Inference

no code implementations NeurIPS 2016 Finnian Lattimore, Tor Lattimore, Mark D. Reid

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment.

Causal Inference

Free Lunch for Optimisation under the Universal Distribution

no code implementations16 Aug 2016 Tom Everitt, Tor Lattimore, Marcus Hutter

Function optimisation is a major challenge in computer science.

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

no code implementations14 Oct 2016 Tor Lattimore, Csaba Szepesvari

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications.

reinforcement-learning Reinforcement Learning (RL) +1

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

no code implementations NeurIPS 2016 Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

1 code implementation NeurIPS 2017 Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

reinforcement-learning Reinforcement Learning (RL)

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis

no code implementations NeurIPS 2017 Tor Lattimore

Existing strategies for finite-armed stochastic bandits mostly depend on a parameter of scale that must be known in advance.

Online Learning with Gated Linear Networks

no code implementations5 Dec 2017 Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth

This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

no code implementations23 May 2018 Tor Lattimore, Csaba Szepesvari

Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner.

General Classification

TopRank: A practical algorithm for online stochastic ranking

no code implementations NeurIPS 2018 Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.

Decision Making Learning-To-Rank +1

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

no code implementations15 Jun 2018 Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.

Learning-To-Rank Re-Ranking +1

Linear Bandits with Stochastic Delayed Feedback

no code implementations ICML 2020 Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.

Marketing Multi-Armed Bandits

Online Learning to Rank with Features

no code implementations5 Oct 2018 Shuai Li, Tor Lattimore, Csaba Szepesvári

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.

Learning-To-Rank

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations13 Nov 2018 Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

no code implementations8 Jan 2019 Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

no code implementations1 Feb 2019 Tor Lattimore, Csaba Szepesvari

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary.

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

no code implementations19 Mar 2019 Roman Pogodin, Tor Lattimore

Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

no code implementations NeurIPS 2019 Julian Zimmert, Tor Lattimore

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.

Thompson Sampling

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

no code implementations7 Jun 2019 Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.

Exploration by Optimisation in Partial Monitoring

no code implementations12 Jul 2019 Tor Lattimore, Csaba Szepesvari

We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound.

Iterative Budgeted Exponential Search

no code implementations30 Jul 2019 Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.

Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution

no code implementations25 Sep 2019 Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel

We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Adaptive Exploration in Linear Contextual Bandit

no code implementations15 Oct 2019 Botao Hao, Tor Lattimore, Csaba Szepesvari

Contextual bandits serve as a fundamental model for many sequential decision making tasks.

Decision Making Multi-Armed Bandits

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

no code implementations ICML 2020 Tor Lattimore, Csaba Szepesvari, Gellert Weisz

The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)$ requires examining essentially all actions.

Information Directed Sampling for Linear Partial Monitoring

no code implementations25 Feb 2020 Johannes Kirschner, Tor Lattimore, Andreas Krause

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits.

Decision Making Decision Making Under Uncertainty

Model Selection in Contextual Stochastic Bandit Problems

no code implementations NeurIPS 2020 Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.

Model Selection Multi-Armed Bandits

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

no code implementations31 May 2020 Tor Lattimore

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2. 5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions.

Matrix games with bandit feedback

no code implementations9 Jun 2020 Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Gaussian Gated Linear Networks

2 code implementations NeurIPS 2020 David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

Denoising Density Estimation +2

Mirror Descent and the Information Ratio

no code implementations25 Sep 2020 Tor Lattimore, András György

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].

Online Sparse Reinforcement Learning

no code implementations8 Nov 2020 Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning Reinforcement Learning (RL)

High-Dimensional Sparse Linear Bandits

no code implementations NeurIPS 2020 Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Vocal Bursts Intensity Prediction

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

Asymptotically Optimal Information-Directed Sampling

no code implementations11 Nov 2020 Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

On the Optimality of Batch Policy Optimization Algorithms

no code implementations6 Apr 2021 Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Information Directed Sampling for Sparse Linear Bandits

no code implementations NeurIPS 2021 Botao Hao, Tor Lattimore, Wei Deng

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure.

Decision Making

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

no code implementations1 Jun 2021 Tor Lattimore

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, \theta\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $\theta \in \mathbb R^d$ that is homogeneous over time.

Bandit Phase Retrieval

no code implementations NeurIPS 2021 Tor Lattimore, Botao Hao

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector.

Retrieval

Near-optimal inference in adaptive linear regression

no code implementations5 Jul 2021 Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation.

Active Learning regression +2

Variational Bayesian Optimistic Sampling

no code implementations NeurIPS 2021 Brendan O'Donoghue, Tor Lattimore

We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy.

Thompson Sampling

Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

no code implementations22 Feb 2022 Tor Lattimore

We show that a version of the generalised information ratio of Lattimore and Gyorgy (2020) determines the asymptotic minimax regret for all finite-action partial monitoring games provided that (a) the standard definition of regret is used but the latent space where the adversary plays is potentially infinite; or (b) the regret introduced by Rustichini (1999) is used and the latent space is finite.

Contextual Information-Directed Sampling

no code implementations22 May 2022 Botao Hao, Tor Lattimore, Chao Qin

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm.

Multi-Armed Bandits Reinforcement Learning (RL)

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

no code implementations26 May 2022 Sanae Amani, Tor Lattimore, András György, Lin F. Yang

In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.

Regret Bounds for Information-Directed Reinforcement Learning

no code implementations9 Jun 2022 Botao Hao, Tor Lattimore

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL) +1

Leveraging Demonstrations to Improve Online Learning: Quality Matters

no code implementations7 Feb 2023 Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level.

Thompson Sampling

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

no code implementations7 Feb 2023 Johannes Kirschner, Tor Lattimore, Andreas Krause

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models.

Decision Making

A Second-Order Method for Stochastic Bandit Convex Optimisation

no code implementations10 Feb 2023 Tor Lattimore, András György

We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1. 5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.

Sequential Best-Arm Identification with Application to Brain-Computer Interface

no code implementations17 May 2023 Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li

A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system.

Brain Computer Interface EEG +3

Bandit Convex Optimisation

no code implementations9 Feb 2024 Tor Lattimore

Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation.

Cannot find the paper you are looking for? You can Submit a new open access paper.