Search Results for author: Aadirupa Saha

Found 36 papers, 1 papers with code

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards

no code implementations ICML 2020 Aadirupa Saha, Pierre Gaillard, Michal Valko

The best existing efficient (i. e., polynomial-time) algorithms for this problem only guarantee a $O(T^{2/3})$ upper-bound on the regret.

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

no code implementations22 Mar 2024 Aadirupa Saha, Hilal Asi

We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy.

Active Learning

Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization

no code implementations29 Feb 2024 Aadirupa Saha, Pierre Gaillard

In this paper, we designed efficient algorithms for the problem of regret minimization in assortment selection with \emph{Plackett Luce} (PL) based user choices.

Recommendation Systems

Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

no code implementations28 Dec 2023 Rohan Deb, Aadirupa Saha

We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective.

Faster Convergence with Multiway Preferences

no code implementations19 Dec 2023 Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour

We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min\{\log m, d\}\epsilon })$.

Federated Online and Bandit Convex Optimization

no code implementations29 Nov 2023 Kumar Kshitij Patel, Lingxiao Wang, Aadirupa Saha, Nati Sebro

Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points.

Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

no code implementations27 Nov 2023 Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu

We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.

Dueling Optimization with a Monotone Adversary

no code implementations18 Nov 2023 Avrim Blum, Meghal Gupta, Gene Li, Naren Sarayu Manoj, Aadirupa Saha, Yuanyuan Yang

We introduce and study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.

On the Vulnerability of Fairness Constrained Learning to Malicious Noise

no code implementations21 Jul 2023 Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl

For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints.

Fairness

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

no code implementations26 Oct 2022 Pierre Gaillard, Aadirupa Saha, Soham Dan

We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for sleeping MAB.

ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

no code implementations25 Oct 2022 Thomas Kleine Buening, Aadirupa Saha

We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem.

Dueling Convex Optimization with General Preferences

no code implementations27 Sep 2022 Aadirupa Saha, Tomer Koren, Yishay Mansour

We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback.

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

no code implementations23 Feb 2022 Suprovat Ghoshal, Aadirupa Saha

We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of $n$ items through online subsetwise preference feedback.

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

no code implementations14 Feb 2022 Aadirupa Saha, Pierre Gaillard

We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner.

Multi-Armed Bandits

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

no code implementations9 Feb 2022 Viktor Bengs, Aadirupa Saha, Eyke Hüllermeier

In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information.

Optimal Algorithms for Stochastic Contextual Preference Bandits

no code implementations NeurIPS 2021 Aadirupa Saha

At each round, the learner is presented with a context set of $K$ items, chosen randomly from a potentially infinite set of arms $\mathcal D \subseteq \mathbf R^d$.

Decision Making Information Retrieval +3

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

no code implementations24 Nov 2021 Aadirupa Saha, Akshay Krishnamurthy

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one decision was better than the other.

Decision Making

Dueling RL: Reinforcement Learning with Trajectory Preferences

no code implementations8 Nov 2021 Aldo Pacchiano, Aadirupa Saha, Jonathan Lee

We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them.

reinforcement-learning Reinforcement Learning (RL)

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

no code implementations6 Nov 2021 Aadirupa Saha, Shubham Gupta

We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(\sqrt{KT})$ high probability regret.

Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

1 code implementation30 Jul 2021 Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems.

Efficient Exploration Multi-agent Reinforcement Learning +2

Dueling Bandits with Adversarial Sleeping

no code implementations NeurIPS 2021 Aadirupa Saha, Pierre Gaillard

The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits.

Management Multi-Armed Bandits

Pure Exploration with Structured Preference Feedback

no code implementations12 Apr 2021 Shubham Gupta, Aadirupa Saha, Sumeet Katariya

We consider the problem of pure exploration with subset-wise preference feedback, which contains $N$ arms with features.

Decision Making

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

no code implementations15 Feb 2021 Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

We study online learning with bandit feedback (i. e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i. e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is one-dimensional.

Decision Making

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations5 Feb 2021 Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Adversarial Dueling Bandits

no code implementations27 Oct 2020 Aadirupa Saha, Tomer Koren, Yishay Mansour

We introduce the problem of regret minimization in Adversarial Dueling Bandits.

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

no code implementations14 Apr 2020 Aadirupa Saha, Pierre Gaillard, Michal Valko

We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i. e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee.

Regret Minimization in Stochastic Contextual Dueling Bandits

no code implementations20 Feb 2020 Aadirupa Saha, Aditya Gopalan

We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets.

Decision Making Information Retrieval +2

Best-item Learning in Random Utility Models with Subset Choices

no code implementations19 Feb 2020 Aadirupa Saha, Aditya Gopalan

We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.

PAC learning

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

no code implementations ICML 2020 Aadirupa Saha, Aditya Gopalan

In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl\, parameters for the top-$k$ items.

PAC learning

Combinatorial Bandits with Relative Feedback

no code implementations NeurIPS 2019 Aadirupa Saha, Aditya Gopalan

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.

How Many Pairwise Preferences Do We Need to Rank A Graph Consistently?

no code implementations6 Nov 2018 Aadirupa Saha, Rakesh Shivanna, Chiranjib Bhattacharyya

Our proposed algorithm, {\it Pref-Rank}, predicts the underlying ranking using an SVM based approach over the chosen embedding of the product graph, and is the first to provide \emph{statistical consistency} on two ranking losses: \emph{Kendall's tau} and \emph{Spearman's footrule}, with a required sample complexity of $O(n^2 \chi(\bar{G}))^{\frac{2}{3}}$ pairs, $\chi(\bar{G})$ being the \emph{chromatic number} of the complement graph $\bar{G}$.

Active Ranking with Subset-wise Preferences

no code implementations23 Oct 2018 Aadirupa Saha, Aditya Gopalan

When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case.

PAC Battling Bandits in the Plackett-Luce Model

no code implementations12 Aug 2018 Aadirupa Saha, Aditya Gopalan

We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e. g., the most preferred item or ranking of the top $m$ most preferred items etc.

Ranking with Features: Algorithm and A Graph Theoretic Analysis

no code implementations11 Aug 2018 Aadirupa Saha, Arun Rajkumar

We present a new least squares based algorithm called fBTL-LS which we show requires much lesser than $O(n\log(n))$ pairs to obtain a good ranking -- precisely our new sample complexity bound is of $O(\alpha\log \alpha)$, where $\alpha$ denotes the number of `independent items' of the set, in general $\alpha << n$.

Graph Matching Matrix Completion

Online Learning for Structured Loss Spaces

no code implementations13 Jun 2017 Siddharth Barman, Aditya Gopalan, Aadirupa Saha

We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls.

Cannot find the paper you are looking for? You can Submit a new open access paper.