Search Results for author: Branislav Kveton

Found 67 papers, 7 papers with code

Influence Diagram Bandits

no code implementations ICML 2020 Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

We experiment with three structured bandit problems: cascading bandits, online learning to rank in the position-based model, and rank-1 bandits.


Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling

no code implementations16 Mar 2023 Aadirupa Saha, Branislav Kveton

The bound for unknown reward variances captures the effect of the prior on learning reward variances and is the first of its kind.

Multi-Armed Bandits Thompson Sampling

Multiplier Bootstrap-based Exploration

no code implementations3 Feb 2023 Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song

Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty.

Multi-Armed Bandits

Thompson Sampling with Diffusion Generative Prior

no code implementations12 Jan 2023 Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems.

Decision Making Denoising +2

Multi-Task Off-Policy Learning from Bandit Feedback

no code implementations9 Dec 2022 Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.

Learning-To-Rank Recommendation Systems

Bayesian Fixed-Budget Best-Arm Identification

no code implementations15 Nov 2022 Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton

We also provide the first lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches the lower bound.

Robust Contextual Linear Bandits

no code implementations26 Oct 2022 Rong Zhu, Branislav Kveton

Our experiments show that RoLinTS is comparably statistically efficient to the classic methods when the misspecification is low, more robust when the misspecification is high, and significantly more computationally efficient than its naive implementation.

Multi-Armed Bandits

From Ranked Lists to Carousels: A Carousel Click Model

no code implementations27 Sep 2022 Behnam Rahdari, Branislav Kveton, Peter Brusilovsky

Our analytical results show that the user can examine more items in the carousel click model than in a single ranked list, due to the structured way of browsing.

Uplifting Bandits

no code implementations8 Jun 2022 Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton

We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them.

Marketing Recommendation Systems

Pessimistic Off-Policy Optimization for Learning to Rank

no code implementations6 Jun 2022 Matej Cief, Branislav Kveton, Michal Kompan

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy.

Learning-To-Rank Recommendation Systems

Mixed-Effect Thompson Sampling

1 code implementation30 May 2022 Imad Aouali, Branislav Kveton, Sumeet Katariya

The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters.

Thompson Sampling

Safe Exploration for Efficient Policy Evaluation and Comparison

no code implementations26 Feb 2022 Runzhe Wan, Branislav Kveton, Rui Song

High-quality data plays a central role in ensuring the accuracy of policy evaluation.

Safe Exploration

Meta-Learning for Simple Regret Minimization

1 code implementation25 Feb 2022 MohammadJavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

This is while we show that the meta simple regret of the frequentist algorithm is $\tilde{O}(\sqrt{m} n + m/ \sqrt{n})$, and thus, worse.


Deep Hierarchy in Bandits

no code implementations3 Feb 2022 Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.

Thompson Sampling

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

no code implementations24 Jan 2022 Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

This problem has been studied extensively in the setting of known objective functions.

Hierarchical Bayesian Bandits

no code implementations12 Nov 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.

Federated Learning Thompson Sampling

Safe Data Collection for Offline and Online Policy Learning

no code implementations8 Nov 2021 Ruihao Zhu, Branislav Kveton

Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy.

Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

no code implementations16 Sep 2021 Muhammad Jehangir Amjad, Christophe Diot, Dimitris Konomis, Branislav Kveton, Augustin Soule, Xiaolong Yang

We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget.

No Regrets for Learning the Prior in Bandits

no code implementations NeurIPS 2021 Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.

Thompson Sampling

Random Effect Bandits

no code implementations23 Jun 2021 Rong Zhu, Branislav Kveton

It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm.

Multi-Armed Bandits Thompson Sampling

Thompson Sampling with a Mixture Prior

no code implementations10 Jun 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Fixed-Budget Best-Arm Identification in Structured Bandits

no code implementations9 Jun 2021 Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh

We analyze our algorithm in linear and generalized linear models (GLMs) and propose a practical implementation based on a G-optimal design.

Multi-Armed Bandits

CORe: Capitalizing On Rewards in Bandit Exploration

no code implementations7 Mar 2021 Nan Wang, Branislav Kveton, Maryam Karimzadehgan

We propose a bandit algorithm that explores purely by randomizing its past observations.

Non-Stationary Latent Bandits

no code implementations1 Dec 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Differentiable Meta-Learning of Bandit Policies

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.


Latent Bandits Revisited

no code implementations NeurIPS 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Non-Stationary Off-Policy Optimization

no code implementations15 Jun 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed

This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.

Multi-Armed Bandits

Meta-Learning Bandit Policies by Gradient Ascent

no code implementations9 Jun 2020 Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.

Meta-Learning Multi-Armed Bandits

Sample Efficient Graph-Based Optimization with Noisy Observations

1 code implementation4 Jun 2020 Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.


Differentiable Bandit Exploration

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.


Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

1 code implementation11 Oct 2019 Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton

We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation.

Thompson Sampling

Randomized Exploration in Generalized Linear Bandits

no code implementations21 Jun 2019 Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Waterfall Bandits: Learning to Sell Ads Online

no code implementations20 Apr 2019 Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret.

Empirical Bayes Regret Minimization

no code implementations4 Apr 2019 Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari

In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution.

Perturbed-History Exploration in Stochastic Linear Bandits

no code implementations21 Mar 2019 Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Our algorithm, perturbed-history exploration in a linear bandit (LinPHE), estimates a linear model from its perturbed history and pulls the arm with the highest value under that model.

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

no code implementations26 Feb 2019 Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

Multi-Armed Bandits

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations13 Nov 2018 Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Online Diverse Learning to Rank from Partial-Click Feedback

no code implementations1 Nov 2018 Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin

We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items.

Learning-To-Rank Recommendation Systems

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

no code implementations15 Jun 2018 Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.

Learning-To-Rank Re-Ranking +1

TopRank: A practical algorithm for online stochastic ranking

no code implementations NeurIPS 2018 Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.

Decision Making Learning-To-Rank

Conservative Exploration using Interleaving

no code implementations3 Jun 2018 Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru

In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action.

Offline Evaluation of Ranking Policies with Click Models

no code implementations27 Apr 2018 Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.

Recommendation Systems

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

no code implementations11 Feb 2018 Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions.

Change Detection

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations19 Mar 2017 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Model-Independent Online Learning for Influence Maximization

no code implementations ICML 2017 Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt

We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.

Does Weather Matter? Causal Analysis of TV Logs

no code implementations25 Jan 2017 Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen

To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.

BIG-bench Machine Learning

Stochastic Rank-1 Bandits

no code implementations10 Aug 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

The main challenge of the problem is that the individual values of the row and column are unobserved.

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

1 code implementation NeurIPS 2017 Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.

Cascading Bandits for Large-Scale Recommendation Problems

1 code implementation17 Mar 2016 Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton

In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.

Multi-Armed Bandits Recommendation Systems +1

Graphical Model Sketch

no code implementations9 Feb 2016 Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun

Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables.

DCM Bandits: Learning to Rank with Multiple Clicks

1 code implementation9 Feb 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen

This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.


Combinatorial Cascading Bandits

no code implementations NeurIPS 2015 Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

The agent observes the index of the first chosen item whose weight is zero.

Cascading Bandits: Learning to Rank in the Cascade Model

no code implementations10 Feb 2015 Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan

We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits.


DUM: Diversity-Weighted Utility Maximization for Recommendations

no code implementations13 Nov 2014 Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, Zheng Wen

The need for diversification of recommendation lists manifests in a number of recommender systems use cases.

Recommendation Systems

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

no code implementations3 Oct 2014 Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

no code implementations28 Jun 2014 Zheng Wen, Branislav Kveton, Azin Ashkan

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Thompson Sampling

Learning to Act Greedily: Polymatroid Semi-Bandits

no code implementations30 May 2014 Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.

Adaptive Submodular Maximization in Bandit Setting

no code implementations NeurIPS 2013 Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Maximization of submodular functions has wide applications in machine learning and artificial intelligence.

Cannot find the paper you are looking for? You can Submit a new open access paper.