Search Results for author: Aadirupa Saha

Found 36 papers, 1 papers with code

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards

no code implementations • ICML 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko

The best existing efficient (i. e., polynomial-time) algorithms for this problem only guarantee a $O(T^{2/3})$ upper-bound on the regret.

Paper
Add Code

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

no code implementations • 22 Mar 2024 • Aadirupa Saha, Hilal Asi

We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy.

Active Learning

Paper
Add Code

Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization

no code implementations • 29 Feb 2024 • Aadirupa Saha, Pierre Gaillard

In this paper, we designed efficient algorithms for the problem of regret minimization in assortment selection with \emph{Plackett Luce} (PL) based user choices.

Recommendation Systems

Paper
Add Code

Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

no code implementations • 28 Dec 2023 • Rohan Deb, Aadirupa Saha

We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective.

Paper
Add Code

Faster Convergence with Multiway Preferences

no code implementations • 19 Dec 2023 • Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour

We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min\{\log m, d\}\epsilon })$.

Paper
Add Code

Federated Online and Bandit Convex Optimization

no code implementations • 29 Nov 2023 • Kumar Kshitij Patel, Lingxiao Wang, Aadirupa Saha, Nati Sebro

Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points.

Paper
Add Code

Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

no code implementations • 27 Nov 2023 • Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu

We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.

Paper
Add Code

Dueling Optimization with a Monotone Adversary

no code implementations • 18 Nov 2023 • Avrim Blum, Meghal Gupta, Gene Li, Naren Sarayu Manoj, Aadirupa Saha, Yuanyuan Yang

We introduce and study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.

Paper
Add Code

On the Vulnerability of Fairness Constrained Learning to Malicious Noise

no code implementations • 21 Jul 2023 • Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl

For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints.

Fairness

Paper
Add Code

Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling

no code implementations • 16 Mar 2023 • Aadirupa Saha, Branislav Kveton

We lay foundations for the Bayesian setting, which incorporates prior knowledge.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

no code implementations • 26 Oct 2022 • Pierre Gaillard, Aadirupa Saha, Soham Dan

We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for sleeping MAB.

Paper
Add Code

ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

no code implementations • 25 Oct 2022 • Thomas Kleine Buening, Aadirupa Saha

We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem.

Paper
Add Code

Dueling Convex Optimization with General Preferences

no code implementations • 27 Sep 2022 • Aadirupa Saha, Tomer Koren, Yishay Mansour

We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback.

Paper
Add Code

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

no code implementations • 23 Feb 2022 • Suprovat Ghoshal, Aadirupa Saha

We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of $n$ items through online subsetwise preference feedback.

Paper
Add Code

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

no code implementations • 14 Feb 2022 • Aadirupa Saha, Pierre Gaillard

We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner.

Multi-Armed Bandits

Paper
Add Code

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

no code implementations • 9 Feb 2022 • Viktor Bengs, Aadirupa Saha, Eyke Hüllermeier

In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information.

Paper
Add Code

Optimal Algorithms for Stochastic Contextual Preference Bandits

no code implementations • NeurIPS 2021 • Aadirupa Saha

At each round, the learner is presented with a context set of $K$ items, chosen randomly from a potentially infinite set of arms $\mathcal D \subseteq \mathbf R^d$.

Decision Making Information Retrieval +3

Paper
Add Code

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

no code implementations • 24 Nov 2021 • Aadirupa Saha, Akshay Krishnamurthy

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one decision was better than the other.

Decision Making

Paper
Add Code

Dueling RL: Reinforcement Learning with Trajectory Preferences

no code implementations • 8 Nov 2021 • Aldo Pacchiano, Aadirupa Saha, Jonathan Lee

We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

no code implementations • 6 Nov 2021 • Aadirupa Saha, Shubham Gupta

We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(\sqrt{KT})$ high probability regret.

Paper
Add Code

Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

1 code implementation • 30 Jul 2021 • Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems.

Efficient Exploration Multi-agent Reinforcement Learning +2

Paper
Code

Dueling Bandits with Adversarial Sleeping

no code implementations • NeurIPS 2021 • Aadirupa Saha, Pierre Gaillard

The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits.

Management Multi-Armed Bandits

Paper
Add Code

Pure Exploration with Structured Preference Feedback

no code implementations • 12 Apr 2021 • Shubham Gupta, Aadirupa Saha, Sumeet Katariya

We consider the problem of pure exploration with subset-wise preference feedback, which contains $N$ arms with features.

Decision Making

Paper
Add Code

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

no code implementations • 15 Feb 2021 • Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

We study online learning with bandit feedback (i. e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i. e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is one-dimensional.

Decision Making

Paper
Add Code

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Paper
Add Code

Adversarial Dueling Bandits

no code implementations • 27 Oct 2020 • Aadirupa Saha, Tomer Koren, Yishay Mansour

We introduce the problem of regret minimization in Adversarial Dueling Bandits.

Paper
Add Code

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

no code implementations • 14 Apr 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko

We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i. e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee.

Paper
Add Code

Regret Minimization in Stochastic Contextual Dueling Bandits

no code implementations • 20 Feb 2020 • Aadirupa Saha, Aditya Gopalan

We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets.

Decision Making Information Retrieval +2

Paper
Add Code

Best-item Learning in Random Utility Models with Subset Choices

no code implementations • 19 Feb 2020 • Aadirupa Saha, Aditya Gopalan

We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.

PAC learning

Paper
Add Code

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

no code implementations • ICML 2020 • Aadirupa Saha, Aditya Gopalan

In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl\, parameters for the top-$k$ items.

PAC learning

Paper
Add Code

Combinatorial Bandits with Relative Feedback

no code implementations • NeurIPS 2019 • Aadirupa Saha, Aditya Gopalan

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.

Paper
Add Code

How Many Pairwise Preferences Do We Need to Rank A Graph Consistently?

no code implementations • 6 Nov 2018 • Aadirupa Saha, Rakesh Shivanna, Chiranjib Bhattacharyya

Our proposed algorithm, {\it Pref-Rank}, predicts the underlying ranking using an SVM based approach over the chosen embedding of the product graph, and is the first to provide \emph{statistical consistency} on two ranking losses: \emph{Kendall's tau} and \emph{Spearman's footrule}, with a required sample complexity of $O(n^2 \chi(\bar{G}))^{\frac{2}{3}}$ pairs, $\chi(\bar{G})$ being the \emph{chromatic number} of the complement graph $\bar{G}$.

Paper
Add Code

Active Ranking with Subset-wise Preferences

no code implementations • 23 Oct 2018 • Aadirupa Saha, Aditya Gopalan

When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case.

Paper
Add Code

PAC Battling Bandits in the Plackett-Luce Model

no code implementations • 12 Aug 2018 • Aadirupa Saha, Aditya Gopalan

We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e. g., the most preferred item or ranking of the top $m$ most preferred items etc.

Paper
Add Code

Ranking with Features: Algorithm and A Graph Theoretic Analysis

no code implementations • 11 Aug 2018 • Aadirupa Saha, Arun Rajkumar

We present a new least squares based algorithm called fBTL-LS which we show requires much lesser than $O(n\log(n))$ pairs to obtain a good ranking -- precisely our new sample complexity bound is of $O(\alpha\log \alpha)$, where $\alpha$ denotes the number of `independent items' of the set, in general $\alpha << n$.

Graph Matching Matrix Completion

Paper
Add Code

Online Learning for Structured Loss Spaces

no code implementations • 13 Jun 2017 • Siddharth Barman, Aditya Gopalan, Aadirupa Saha

We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.