Search Results for author: Branislav Kveton

Found 75 papers, 8 papers with code

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

1 code implementation • NeurIPS 2017 • Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.

Paper
Code

Cascading Bandits for Large-Scale Recommendation Problems

1 code implementation • 17 Mar 2016 • Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton

In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.

Multi-Armed Bandits Recommendation Systems +1

Paper
Code

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

1 code implementation • 11 Oct 2019 • Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton

We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation.

Thompson Sampling

Paper
Code

Pre-trained Recommender Systems: A Causal Debiasing Perspective

1 code implementation • 30 Oct 2023 • Ziqian Lin, Hao Ding, Nghia Trong Hoang, Branislav Kveton, Anoop Deoras, Hao Wang

In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data).

Few-Shot Learning Recommendation Systems

Paper
Code

Mixed-Effect Thompson Sampling

1 code implementation • 30 May 2022 • Imad Aouali, Branislav Kveton, Sumeet Katariya

The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters.

Thompson Sampling

Paper
Code

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

no code implementations • 15 Jun 2018 • Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.

Learning-To-Rank Re-Ranking +1

Paper
Add Code

Offline Evaluation of Ranking Policies with Click Models

no code implementations • 27 Apr 2018 • Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.

Recommendation Systems

Paper
Add Code

TopRank: A practical algorithm for online stochastic ranking

no code implementations • NeurIPS 2018 • Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.

Decision Making Learning-To-Rank +1

Paper
Add Code

Conservative Exploration using Interleaving

no code implementations • 3 Jun 2018 • Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru

In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action.

Paper
Add Code

Model-Independent Online Learning for Influence Maximization

no code implementations • ICML 2017 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt

We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.

Paper
Add Code

New Insights into Bootstrapping for Bandits

no code implementations • 24 May 2018 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori

We investigate the use of bootstrapping in the bandit setting.

Thompson Sampling

Paper
Add Code

SpectralLeader: Online Spectral Learning for Single Topic Models

no code implementations • 21 Sep 2017 • Tong Yu, Branislav Kveton, Zheng Wen, Hung Bui, Ole J. Mengshoel

We study the problem of learning a latent variable model from a stream of data.

Topic Models

Paper
Add Code

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

no code implementations • 11 Feb 2018 • Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions.

Change Detection

Paper
Add Code

Stochastic Low-Rank Bandits

no code implementations • 13 Dec 2017 • Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan

Many problems in computer vision and recommender systems involve low-rank matrices.

Recommendation Systems

Paper
Add Code

Online Learning to Rank in Stochastic Click Models

no code implementations • ICML 2017 • Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen

In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models.

Information Retrieval Learning-To-Rank +1

Paper
Add Code

Does Weather Matter? Causal Analysis of TV Logs

no code implementations • 25 Jan 2017 • Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen

To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.

BIG-bench Machine Learning

Paper
Add Code

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Position

Paper
Add Code

Stochastic Rank-1 Bandits

no code implementations • 10 Aug 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

The main challenge of the problem is that the individual values of the row and column are unobserved.

Paper
Add Code

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

no code implementations • 28 Jun 2014 • Zheng Wen, Branislav Kveton, Azin Ashkan

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Thompson Sampling

Paper
Add Code

Graphical Model Sketch

no code implementations • 9 Feb 2016 • Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun

Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables.

Paper
Add Code

DCM Bandits: Learning to Rank with Multiple Clicks

1 code implementation • 9 Feb 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen

This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.

Learning-To-Rank

Paper
Code

Combinatorial Cascading Bandits

no code implementations • NeurIPS 2015 • Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

The agent observes the index of the first chosen item whose weight is zero.

Paper
Add Code

Cascading Bandits: Learning to Rank in the Cascade Model

no code implementations • 10 Feb 2015 • Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan

We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits.

Learning-To-Rank

Paper
Add Code

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

no code implementations • 3 Oct 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Paper
Add Code

Learning to Act Greedily: Polymatroid Semi-Bandits

no code implementations • 30 May 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.

Paper
Add Code

DUM: Diversity-Weighted Utility Maximization for Recommendations

no code implementations • 13 Nov 2014 • Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, Zheng Wen

The need for diversification of recommendation lists manifests in a number of recommender systems use cases.

Recommendation Systems

Paper
Add Code

Matroid Bandits: Fast Combinatorial Optimization with Learning

no code implementations • 20 Mar 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson

The objective in these problems is to learn how to maximize a modular function on a matroid.

Combinatorial Optimization Computational Efficiency

Paper
Add Code

Online Diverse Learning to Rank from Partial-Click Feedback

no code implementations • 1 Nov 2018 • Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin

We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items.

Learning-To-Rank Recommendation Systems

Paper
Add Code

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Paper
Add Code

Efficient Thompson Sampling for Online Matrix-Factorization Recommendation

no code implementations • NeurIPS 2015 • Jaya Kawale, Hung H. Bui, Branislav Kveton, Long Tran-Thanh, Sanjay Chawla

Matrix factorization (MF) collaborative filtering is an effective and widely used method in recommendation systems.

Collaborative Filtering Recommendation Systems +1

Paper
Add Code

Adaptive Submodular Maximization in Bandit Setting

no code implementations • NeurIPS 2013 • Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Maximization of submodular functions has wide applications in machine learning and artificial intelligence.

Paper
Add Code

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

no code implementations • 26 Feb 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

Multi-Armed Bandits

Paper
Add Code

Perturbed-History Exploration in Stochastic Linear Bandits

no code implementations • 21 Mar 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

We evaluate our algorithms empirically and show that they are practical.

Paper
Add Code

Empirical Bayes Regret Minimization

no code implementations • 4 Apr 2019 • Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari

In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution.

Paper
Add Code

Waterfall Bandits: Learning to Sell Ads Online

no code implementations • 20 Apr 2019 • Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret.

Paper
Add Code

Randomized Exploration in Generalized Linear Bandits

no code implementations • 21 Jun 2019 • Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Paper
Add Code

Differentiable Bandit Exploration

no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.

Meta-Learning

Paper
Add Code

Sample Efficient Graph-Based Optimization with Noisy Observations

1 code implementation • 4 Jun 2020 • Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.

Re-Ranking

Paper
Code

Meta-Learning Bandit Policies by Gradient Ascent

no code implementations • 9 Jun 2020 • Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.

Meta-Learning Multi-Armed Bandits

Paper
Add Code

Non-Stationary Off-Policy Optimization

no code implementations • 15 Jun 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed

This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.

Multi-Armed Bandits

Paper
Add Code

Latent Bandits Revisited

no code implementations • NeurIPS 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Paper
Add Code

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

no code implementations • 9 Jul 2020 • Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

We propose a novel framework for structured bandits, which we call an influence diagram bandit.

Thompson Sampling

Paper
Add Code

Influence Diagram Bandits

no code implementations • ICML 2020 • Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

We experiment with three structured bandit problems: cascading bandits, online learning to rank in the position-based model, and rank-1 bandits.

Learning-To-Rank Position

Paper
Add Code

Differentiable Meta-Learning of Bandit Policies

no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.

Meta-Learning

Paper
Add Code

Non-Stationary Latent Bandits

no code implementations • 1 Dec 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Paper
Add Code

Meta-Thompson Sampling

no code implementations • 11 Feb 2021 • Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

Efficient exploration in bandits is a fundamental online learning problem.

Efficient Exploration Meta-Learning +2

Paper
Add Code

CORe: Capitalizing On Rewards in Bandit Exploration

no code implementations • 7 Mar 2021 • Nan Wang, Branislav Kveton, Maryam Karimzadehgan

We propose a bandit algorithm that explores purely by randomizing its past observations.

Paper
Add Code

Fixed-Budget Best-Arm Identification in Structured Bandits

no code implementations • 9 Jun 2021 • Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh

We analyze our algorithm in linear and generalized linear models (GLMs), and propose a practical implementation based on a G-optimal design.

Multi-Armed Bandits

Paper
Add Code

Thompson Sampling with a Mixture Prior

no code implementations • 10 Jun 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Paper
Add Code

Random Effect Bandits

no code implementations • 23 Jun 2021 • Rong Zhu, Branislav Kveton

It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

No Regrets for Learning the Prior in Bandits

no code implementations • NeurIPS 2021 • Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.

Thompson Sampling

Paper
Add Code

Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

no code implementations • 16 Sep 2021 • Muhammad Jehangir Amjad, Christophe Diot, Dimitris Konomis, Branislav Kveton, Augustin Soule, Xiaolong Yang

We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget.

Paper
Add Code

Safe Data Collection for Offline and Online Policy Learning

no code implementations • 8 Nov 2021 • Ruihao Zhu, Branislav Kveton

Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy.

Paper
Add Code

Hierarchical Bayesian Bandits

no code implementations • 12 Nov 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.

Federated Learning Thompson Sampling

Paper
Add Code

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

no code implementations • 24 Jan 2022 • Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

This problem has been studied extensively in the setting of known objective functions.

Paper
Add Code

Deep Hierarchy in Bandits

no code implementations • 3 Feb 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.

Thompson Sampling

Paper
Add Code

Meta-Learning for Simple Regret Minimization

1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over $m$ bandit tasks with horizon $n$ is mere $\tilde{O}(m / \sqrt{n})$.

Meta-Learning

Paper
Code

Safe Exploration for Efficient Policy Evaluation and Comparison

no code implementations • 26 Feb 2022 • Runzhe Wan, Branislav Kveton, Rui Song

High-quality data plays a central role in ensuring the accuracy of policy evaluation.

Safe Exploration

Paper
Add Code

Pessimistic Off-Policy Optimization for Learning to Rank

no code implementations • 6 Jun 2022 • Matej Cief, Branislav Kveton, Michal Kompan

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy.

Learning-To-Rank Recommendation Systems

Paper
Add Code

Uplifting Bandits

no code implementations • 8 Jun 2022 • Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton

We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them.

Marketing Recommendation Systems

Paper
Add Code

From Ranked Lists to Carousels: A Carousel Click Model

no code implementations • 27 Sep 2022 • Behnam Rahdari, Branislav Kveton, Peter Brusilovsky

Our analytical results show that the user can examine more items in the carousel click model than in a single ranked list, due to the structured way of browsing.

Paper
Add Code

Robust Contextual Linear Bandits

no code implementations • 26 Oct 2022 • Rong Zhu, Branislav Kveton

Our experiments show that RoLinTS is comparably statistically efficient to the classic methods when the misspecification is low, more robust when the misspecification is high, and significantly more computationally efficient than its naive implementation.

Multi-Armed Bandits

Paper
Add Code

Bayesian Fixed-Budget Best-Arm Identification

no code implementations • 15 Nov 2022 • Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton

We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget.

Paper
Add Code

Multi-Task Off-Policy Learning from Bandit Feedback

no code implementations • 9 Dec 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.

Learning-To-Rank Recommendation Systems

Paper
Add Code

Thompson Sampling with Diffusion Generative Prior

no code implementations • 12 Jan 2023 • Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems.

Decision Making Denoising +2

Paper
Add Code

Selective Uncertainty Propagation in Offline RL

no code implementations • 1 Feb 2023 • Sanath Kumar Krishnamurthy, Shrey Modi, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi

We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms.

Offline RL reinforcement-learning +1

Paper
Add Code

Multiplier Bootstrap-based Exploration

no code implementations • 3 Feb 2023 • Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song

Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty.

Multi-Armed Bandits

Paper
Add Code

Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling

no code implementations • 16 Mar 2023 • Aadirupa Saha, Branislav Kveton

We lay foundations for the Bayesian setting, which incorporates prior knowledge.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

no code implementations • 13 Jun 2023 • Anusha Lalitha, Kousha Kalantari, Yifei Ma, Anoop Deoras, Branislav Kveton

Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances.

Paper
Add Code

Finite-Time Logarithmic Bayes Regret Upper Bounds

no code implementations • 15 Jun 2023 • Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

In a multi-armed bandit, we obtain $O(c_\Delta \log n)$ and $O(c_h \log^2 n)$ upper bounds for an upper confidence bound algorithm, where $c_h$ and $c_\Delta$ are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively.

Paper
Add Code

Efficient and Interpretable Bandit Algorithms

no code implementations • 23 Oct 2023 • Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton

We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.

Paper
Add Code

Pessimistic Off-Policy Multi-Objective Optimization

no code implementations • 28 Oct 2023 • Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu

The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments.

Decision Making

Paper
Add Code

Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

no code implementations • 22 Dec 2023 • Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras, Branislav Kveton

The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations.

Explanation Generation Position +1

Paper
Add Code

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

no code implementations • 17 Jan 2024 • Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed.

Paper
Add Code

Experimental Design for Active Transductive Inference in Large Language Models

no code implementations • 12 Apr 2024 • Subhojyoti Mukherjee, Ge Liu, Aniket Deshmukh, Anusha Lalitha, Yifei Ma, Branislav Kveton

We design the LLM prompt by adaptively choosing few-shot examples for a given inference query.

Experimental Design

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.