You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 3 Jun 2021 • Tor Lattimore, Botao Hao

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector.

no code implementations • 1 Jun 2021 • Tor Lattimore

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, \theta\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $\theta \in \mathbb R^d$ that is homogeneous over time.

no code implementations • 29 May 2021 • Botao Hao, Tor Lattimore, Wei Deng

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure.

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

no code implementations • NeurIPS 2020 • Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

no code implementations • 25 Sep 2020 • Tor Lattimore, András György

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].

1 code implementation • NeurIPS 2020 • David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

no code implementations • 9 Jun 2020 • Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

no code implementations • 31 May 2020 • Tor Lattimore

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2. 5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions.

no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

We propose a master algorithm inspired by CORRAL \cite{DBLP:conf/colt/AgarwalLNS17} and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits us to obtain $O(\sqrt{T})$ regret guarantees for a wide class of base algorithms when working along with our master.

no code implementations • 25 Feb 2020 • Johannes Kirschner, Tor Lattimore, Andreas Krause

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits.

no code implementations • ICML 2020 • Tor Lattimore, Csaba Szepesvari, Gellert Weisz

The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)$ requires examining essentially all actions.

no code implementations • 15 Oct 2019 • Botao Hao, Tor Lattimore, Csaba Szepesvari

Contextual bandits serve as a fundamental model for many sequential decision making tasks.

1 code implementation • 30 Sep 2019 • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter

This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs).

2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

no code implementations • 30 Jul 2019 • Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.

no code implementations • 12 Jul 2019 • Tor Lattimore, Csaba Szepesvari

We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound.

no code implementations • 7 Jun 2019 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.

no code implementations • NeurIPS 2019 • Julian Zimmert, Tor Lattimore

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.

no code implementations • 19 Mar 2019 • Roman Pogodin, Tor Lattimore

Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

no code implementations • 27 Feb 2019 • Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli

Machine learning is used extensively in recommender systems deployed in products.

no code implementations • 1 Feb 2019 • Tor Lattimore, Csaba Szepesvari

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary.

no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.

no code implementations • 8 Jan 2019 • Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

1 code implementation • NeurIPS 2018 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber

We introduce two novel tree search algorithms that use a policy to guide search.

no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

no code implementations • 5 Oct 2018 • Shuai Li, Tor Lattimore, Csaba Szepesvári

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.

no code implementations • ICML 2020 • Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.

no code implementations • 15 Jun 2018 • Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.

no code implementations • NeurIPS 2018 • Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.

no code implementations • 23 May 2018 • Tor Lattimore, Csaba Szepesvari

Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner.

no code implementations • 5 Dec 2017 • Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth

This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.

no code implementations • NeurIPS 2017 • Tor Lattimore

Existing strategies for finite-armed stochastic bandits mostly depend on a parameter of scale that must be known in advance.

1 code implementation • NeurIPS 2017 • Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

no code implementations • NeurIPS 2016 • Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

no code implementations • 14 Oct 2016 • Tor Lattimore, Csaba Szepesvari

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications.

no code implementations • 16 Aug 2016 • Tom Everitt, Tor Lattimore, Marcus Hutter

Function optimisation is a major challenge in computer science.

no code implementations • NeurIPS 2016 • Finnian Lattimore, Tor Lattimore, Mark D. Reid

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment.

no code implementations • NeurIPS 2016 • Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.

no code implementations • NeurIPS 2016 • Sébastien Gerchinovitz, Tor Lattimore

First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case.

no code implementations • 29 Mar 2016 • Tor Lattimore

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise.

no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

no code implementations • 13 Feb 2016 • Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári

We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.

no code implementations • NeurIPS 2015 • Tor Lattimore, Koby Crammer, Csaba Szepesvari

In each time step the learner chooses an allocation of several resource types between a number of tasks.

no code implementations • 18 Nov 2015 • Tor Lattimore

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon.

no code implementations • NeurIPS 2015 • Tor Lattimore

Given a multi-armed bandit problem it may be desirable to achieve a smaller-than-usual worst-case regret for some special actions.

no code implementations • 28 Jul 2015 • Tor Lattimore

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret.

no code implementations • NeurIPS 2014 • Tor Lattimore, Remi Munos

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms.

no code implementations • 15 Jun 2014 • Tor Lattimore, Koby Crammer, Csaba Szepesvári

We study a sequential resource allocation problem involving a fixed number of recurring jobs.

no code implementations • 22 Aug 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag

We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.

no code implementations • 29 Jun 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag

We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.