1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh
We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.
no code implementations • 17 Jan 2022 • Yasin Abbasi-Yadkori, Andras Gyorgy, Nevena Lazic
We propose a method that achieves, in $K$-armed bandit problems, a near-optimal $\widetilde O(\sqrt{K N(S+1)})$ dynamic regret, where $N$ is the time horizon of the problem and $S$ is the number of times the identity of the optimal arm changes, without prior knowledge of $S$.
no code implementations • 12 Aug 2021 • Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári
Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.
no code implementations • 9 Jun 2021 • Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh
In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$.
no code implementations • 25 Feb 2021 • Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari
We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation.
no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári
We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.
no code implementations • 3 Feb 2021 • Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári
We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.
no code implementations • 20 Oct 2020 • Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori
This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma.
no code implementations • 9 Jun 2020 • Yasin Abbasi-Yadkori, Aldo Pacchiano, My Phan
Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion.
1 code implementation • 4 Jun 2020 • Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton
We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.
no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.
1 code implementation • 8 Feb 2020 • Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvari
This is an improvement over the best existing bound of $\tilde{O}(T^{3/4})$ for the average-reward case with function approximation.
no code implementations • 27 Aug 2019 • Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz
POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value estimation error can be controlled, which can be satisfied if all policies sufficiently explore the environment.
no code implementations • NeurIPS 2019 • My Phan, Yasin Abbasi-Yadkori, Justin Domke
We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems.
no code implementations • NeurIPS 2019 • Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng
Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.
no code implementations • 6 Jan 2019 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek
Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class.
no code implementations • 24 May 2018 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori
We investigate the use of bootstrapping in the bandit setting.
no code implementations • 4 May 2018 • Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan
We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.
no code implementations • 27 Apr 2018 • Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen
We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.
no code implementations • 17 Apr 2018 • Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari
Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics.
no code implementations • 26 Feb 2018 • Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis
However, under a condition that is akin to the occupancy measures of the base policies having large overlap, we show that there exists an efficient algorithm that finds a policy that is almost as good as the best convex combination of the base policies.
no code implementations • 10 Feb 2018 • Ali Shameli, Yasin Abbasi-Yadkori
We show the effectiveness of the proposed technique in the problem of nearest neighbor classification.
no code implementations • 13 Dec 2017 • Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan
Many problems in computer vision and recommender systems involve low-rank matrices.
no code implementations • 21 Nov 2017 • Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis
Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.
no code implementations • NeurIPS 2017 • Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy
We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint.
no code implementations • 19 Oct 2016 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek
We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.
no code implementations • 26 Jun 2014 • Yasin Abbasi-Yadkori, Gergely Neu
We study online learning of finite Markov decision process (MDP) problems when a side information vector is available.
no code implementations • 16 Jun 2014 • Yasin Abbasi-Yadkori, Csaba Szepesvari
We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems.
no code implementations • 27 Feb 2014 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek
We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost.
no code implementations • NeurIPS 2011 • Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem.