no code implementations • ICML 2020 • Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
We experiment with three structured bandit problems: cascading bandits, online learning to rank in the position-based model, and rank-1 bandits.
no code implementations • 16 Mar 2023 • Aadirupa Saha, Branislav Kveton
The bound for unknown reward variances captures the effect of the prior on learning reward variances and is the first of its kind.
no code implementations • 3 Feb 2023 • Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song
Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty.
no code implementations • 1 Feb 2023 • Sanath Kumar Krishnamurthy, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi
We study the finite-horizon offline reinforcement learning (RL) problem.
no code implementations • 12 Jan 2023 • Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum
In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems.
no code implementations • 9 Dec 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
no code implementations • 15 Nov 2022 • Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton
We also provide the first lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches the lower bound.
no code implementations • 26 Oct 2022 • Rong Zhu, Branislav Kveton
Our experiments show that RoLinTS is comparably statistically efficient to the classic methods when the misspecification is low, more robust when the misspecification is high, and significantly more computationally efficient than its naive implementation.
no code implementations • 27 Sep 2022 • Behnam Rahdari, Branislav Kveton, Peter Brusilovsky
Our analytical results show that the user can examine more items in the carousel click model than in a single ranked list, due to the structured way of browsing.
no code implementations • 8 Jun 2022 • Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton
We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them.
no code implementations • 6 Jun 2022 • Matej Cief, Branislav Kveton, Michal Kompan
Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy.
1 code implementation • 30 May 2022 • Imad Aouali, Branislav Kveton, Sumeet Katariya
The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters.
no code implementations • 26 Feb 2022 • Runzhe Wan, Branislav Kveton, Rui Song
High-quality data plays a central role in ensuring the accuracy of policy evaluation.
1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya
This is while we show that the meta simple regret of the frequentist algorithm is $\tilde{O}(\sqrt{m} n + m/ \sqrt{n})$, and thus, worse.
no code implementations • 3 Feb 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.
no code implementations • 24 Jan 2022 • Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier
This problem has been studied extensively in the setting of known objective functions.
no code implementations • 12 Nov 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh
We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.
no code implementations • 8 Nov 2021 • Ruihao Zhu, Branislav Kveton
Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy.
no code implementations • 16 Sep 2021 • Muhammad Jehangir Amjad, Christophe Diot, Dimitris Konomis, Branislav Kveton, Augustin Soule, Xiaolong Yang
We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget.
no code implementations • NeurIPS 2021 • Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári
We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.
no code implementations • 23 Jun 2021 • Rong Zhu, Branislav Kveton
It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm.
no code implementations • 10 Jun 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier
We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.
no code implementations • 9 Jun 2021 • Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh
We analyze our algorithm in linear and generalized linear models (GLMs) and propose a practical implementation based on a G-optimal design.
no code implementations • 7 Mar 2021 • Nan Wang, Branislav Kveton, Maryam Karimzadehgan
We propose a bandit algorithm that explores purely by randomizing its past observations.
no code implementations • 11 Feb 2021 • Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari
Efficient exploration in bandits is a fundamental online learning problem.
no code implementations • 1 Dec 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier
The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.
no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.
no code implementations • 9 Jul 2020 • Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
We propose a novel framework for structured bandits, which we call an influence diagram bandit.
no code implementations • NeurIPS 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
no code implementations • 15 Jun 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed
This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.
no code implementations • 9 Jun 2020 • Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier
Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.
1 code implementation • 4 Jun 2020 • Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton
We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.
no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.
1 code implementation • 11 Oct 2019 • Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton
We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation.
no code implementations • 21 Jun 2019 • Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier
GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.
no code implementations • 20 Apr 2019 • Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian
We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret.
no code implementations • 4 Apr 2019 • Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari
In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution.
no code implementations • 21 Mar 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier
Our algorithm, perturbed-history exploration in a linear bandit (LinPHE), estimates a linear model from its perturbed history and pulls the arm with the highest value under that model.
no code implementations • 26 Feb 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier
Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.
no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore
Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.
no code implementations • 1 Nov 2018 • Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin
We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items.
no code implementations • 15 Jun 2018 • Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi
In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.
no code implementations • NeurIPS 2018 • Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.
no code implementations • 3 Jun 2018 • Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru
In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action.
no code implementations • 24 May 2018 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori
We investigate the use of bootstrapping in the bandit setting.
no code implementations • 27 Apr 2018 • Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen
We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.
no code implementations • 11 Feb 2018 • Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie
Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions.
no code implementations • 13 Dec 2017 • Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan
Many problems in computer vision and recommender systems involve low-rank matrices.
no code implementations • 21 Sep 2017 • Tong Yu, Branislav Kveton, Zheng Wen, Hung Bui, Ole J. Mengshoel
We study the problem of learning a latent variable model from a stream of data.
no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen
The probability that a user will click a search result depends both on its relevance and its position on the results page.
no code implementations • ICML 2017 • Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen
In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models.
no code implementations • ICML 2017 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt
We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.
no code implementations • 25 Jan 2017 • Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen
To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.
no code implementations • 10 Aug 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen
The main challenge of the problem is that the individual values of the row and column are unobserved.
1 code implementation • NeurIPS 2017 • Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani
Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.
1 code implementation • 17 Mar 2016 • Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton
In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.
no code implementations • 9 Feb 2016 • Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun
Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables.
1 code implementation • 9 Feb 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen
This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.
no code implementations • NeurIPS 2015 • Jaya Kawale, Hung H. Bui, Branislav Kveton, Long Tran-Thanh, Sanjay Chawla
Matrix factorization (MF) collaborative filtering is an effective and widely used method in recommendation systems.
no code implementations • NeurIPS 2015 • Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
The agent observes the index of the first chosen item whose weight is zero.
no code implementations • 10 Feb 2015 • Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan
We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits.
no code implementations • 13 Nov 2014 • Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, Zheng Wen
The need for diversification of recommendation lists manifests in a number of recommender systems use cases.
no code implementations • 3 Oct 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff.
no code implementations • 28 Jun 2014 • Zheng Wen, Branislav Kveton, Azin Ashkan
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff.
no code implementations • 30 May 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko
Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.
no code implementations • 20 Mar 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson
The objective in these problems is to learn how to maximize a modular function on a matroid.
no code implementations • NeurIPS 2013 • Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan
Maximization of submodular functions has wide applications in machine learning and artificial intelligence.