no code implementations • ICML 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko
The best existing efficient (i. e., polynomial-time) algorithms for this problem only guarantee a $O(T^{2/3})$ upper-bound on the regret.
no code implementations • 1 Jun 2024 • Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can strategically misreport their privately observed contexts to the learner.
no code implementations • 22 Mar 2024 • Aadirupa Saha, Hilal Asi
We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy.
no code implementations • 29 Feb 2024 • Aadirupa Saha, Pierre Gaillard
In this paper, we designed efficient algorithms for the problem of regret minimization in assortment selection with \emph{Plackett Luce} (PL) based user choices.
no code implementations • 28 Dec 2023 • Rohan Deb, Aadirupa Saha
We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective.
no code implementations • 19 Dec 2023 • Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour
We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min\{\log m, d\}\epsilon })$.
no code implementations • 29 Nov 2023 • Kumar Kshitij Patel, Lingxiao Wang, Aadirupa Saha, Nati Sebro
Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points.
no code implementations • 27 Nov 2023 • Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.
no code implementations • 18 Nov 2023 • Avrim Blum, Meghal Gupta, Gene Li, Naren Sarayu Manoj, Aadirupa Saha, Yuanyuan Yang
We introduce and study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.
no code implementations • 21 Jul 2023 • Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl
For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints.
no code implementations • 16 Mar 2023 • Aadirupa Saha, Branislav Kveton
We lay foundations for the Bayesian setting, which incorporates prior knowledge.
no code implementations • 26 Oct 2022 • Pierre Gaillard, Aadirupa Saha, Soham Dan
We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for sleeping MAB.
no code implementations • 25 Oct 2022 • Thomas Kleine Buening, Aadirupa Saha
We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem.
no code implementations • 27 Sep 2022 • Aadirupa Saha, Tomer Koren, Yishay Mansour
We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback.
no code implementations • 23 Feb 2022 • Suprovat Ghoshal, Aadirupa Saha
We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of $n$ items through online subsetwise preference feedback.
no code implementations • 14 Feb 2022 • Aadirupa Saha, Pierre Gaillard
We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner.
no code implementations • 9 Feb 2022 • Viktor Bengs, Aadirupa Saha, Eyke Hüllermeier
In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information.
no code implementations • NeurIPS 2021 • Aadirupa Saha
At each round, the learner is presented with a context set of $K$ items, chosen randomly from a potentially infinite set of arms $\mathcal D \subseteq \mathbf R^d$.
no code implementations • 24 Nov 2021 • Aadirupa Saha, Akshay Krishnamurthy
We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one decision was better than the other.
no code implementations • 8 Nov 2021 • Aldo Pacchiano, Aadirupa Saha, Jonathan Lee
We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them.
no code implementations • 6 Nov 2021 • Aadirupa Saha, Shubham Gupta
We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(\sqrt{KT})$ high probability regret.
1 code implementation • 30 Jul 2021 • Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann
High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems.
no code implementations • NeurIPS 2021 • Aadirupa Saha, Pierre Gaillard
The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits.
no code implementations • 12 Apr 2021 • Shubham Gupta, Aadirupa Saha, Sumeet Katariya
We consider the problem of pure exploration with subset-wise preference feedback, which contains $N$ arms with features.
no code implementations • 15 Feb 2021 • Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain
We study online learning with bandit feedback (i. e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i. e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is one-dimensional.
no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor
We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.
no code implementations • 27 Oct 2020 • Aadirupa Saha, Tomer Koren, Yishay Mansour
We introduce the problem of regret minimization in Adversarial Dueling Bandits.
no code implementations • 14 Apr 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko
We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i. e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee.
no code implementations • 20 Feb 2020 • Aadirupa Saha, Aditya Gopalan
We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets.
no code implementations • 19 Feb 2020 • Aadirupa Saha, Aditya Gopalan
We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.
no code implementations • ICML 2020 • Aadirupa Saha, Aditya Gopalan
In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl\, parameters for the top-$k$ items.
no code implementations • NeurIPS 2019 • Aadirupa Saha, Aditya Gopalan
We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.
no code implementations • 6 Nov 2018 • Aadirupa Saha, Rakesh Shivanna, Chiranjib Bhattacharyya
Our proposed algorithm, {\it Pref-Rank}, predicts the underlying ranking using an SVM based approach over the chosen embedding of the product graph, and is the first to provide \emph{statistical consistency} on two ranking losses: \emph{Kendall's tau} and \emph{Spearman's footrule}, with a required sample complexity of $O(n^2 \chi(\bar{G}))^{\frac{2}{3}}$ pairs, $\chi(\bar{G})$ being the \emph{chromatic number} of the complement graph $\bar{G}$.
no code implementations • 23 Oct 2018 • Aadirupa Saha, Aditya Gopalan
When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case.
no code implementations • 12 Aug 2018 • Aadirupa Saha, Aditya Gopalan
We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e. g., the most preferred item or ranking of the top $m$ most preferred items etc.
no code implementations • 11 Aug 2018 • Aadirupa Saha, Arun Rajkumar
We present a new least squares based algorithm called fBTL-LS which we show requires much lesser than $O(n\log(n))$ pairs to obtain a good ranking -- precisely our new sample complexity bound is of $O(\alpha\log \alpha)$, where $\alpha$ denotes the number of `independent items' of the set, in general $\alpha << n$.
no code implementations • 13 Jun 2017 • Siddharth Barman, Aditya Gopalan, Aadirupa Saha
We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls.