no code implementations • 21 Jun 2024 • Aditya Gangrade, Aditya Gopalan, Venkatesh Saligrama, Clayton Scott
While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem.
no code implementations • 13 Oct 2023 • Debangshu Banerjee, Aditya Gopalan
Parametric, feature-based reward models are employed by a variety of algorithms in decision-making settings such as bandits and Markov decision processes (MDPs).
no code implementations • 6 Sep 2023 • Pavan Karjol, Rohan Kashyap, Aditya Gopalan, Prathosh A. P
At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions invariant to these subgroups in a principled manner.
no code implementations • 9 Jan 2023 • Debangshu Banerjee, Aditya Gopalan
As noted in the works of \cite{lattimore2020bandit}, it has been mentioned that it is an open problem to characterize the minimax regret of linear bandits in a wide variety of action spaces.
no code implementations • 23 Jul 2022 • Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.
no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.
no code implementations • 26 May 2022 • Aditya Gopalan, Gugan Thoppe
Q-learning and SARSA with $\epsilon$-greedy exploration are leading reinforcement learning methods.
no code implementations • 31 Mar 2022 • Dipayan Sen, L. A. Prashanth, Aditya Gopalan
We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round.
no code implementations • 18 Jan 2022 • Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan
For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.
no code implementations • 25 Oct 2021 • Ramakrishnan Krishnamurthy, Aditya Gopalan
We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is essentially minimax optimal.
no code implementations • NeurIPS 2021 • Aditya Gopalan, Venkatesh Saligrama, Braghadeesh Lakshminarayanan
Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns.
no code implementations • 1 May 2021 • Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.
no code implementations • 2 Nov 2020 • Advait Parulekar, Soumya Basu, Aditya Gopalan, Karthikeyan Shanmugam, Sanjay Shakkottai
We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace.
no code implementations • 20 Aug 2020 • Sayak Ray Chowdhury, Aditya Gopalan
We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives.
no code implementations • 13 Jun 2020 • Mohammadi Zaki, Avi Mohan, Aditya Gopalan
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
1 code implementation • 24 Apr 2020 • Aditya Gopalan, Himanshu Tyagi
We use the simulation framework to compare the performance of three testing policies: Random Symptomatic Testing (RST), Contact Tracing (CT), and a new Location Based Testing policy (LBT).
no code implementations • 20 Feb 2020 • Aadirupa Saha, Aditya Gopalan
We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets.
no code implementations • 19 Feb 2020 • Aadirupa Saha, Aditya Gopalan
We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.
no code implementations • 19 Nov 2019 • Dhruti Shah, Tuhinangshu Choudhury, Nikhil Karamchandani, Aditya Gopalan
We consider the problem of adaptively PAC-learning a probability distribution $\mathcal{P}$'s mode by querying an oracle for information about a sequence of i. i. d.
no code implementations • 5 Nov 2019 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan
We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting.
no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan
We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.
no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan
We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques.
no code implementations • 19 Oct 2019 • Siddharth Mitra, Aditya Gopalan
We then consider revealing action-partial monitoring games -- a version of label efficient prediction with additive information costs, which in general are known to lie in the \textit{hard} class of games having minimax regret of order $T^{\frac{2}{3}}$.
1 code implementation • NeurIPS 2019 • Sayak Ray Chowdhury, Aditya Gopalan
We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel.
no code implementations • NeurIPS 2019 • Aadirupa Saha, Aditya Gopalan
We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.
no code implementations • ICML 2020 • Aadirupa Saha, Aditya Gopalan
In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl\, parameters for the top-$k$ items.
no code implementations • 23 Oct 2018 • Aadirupa Saha, Aditya Gopalan
When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case.
no code implementations • 12 Aug 2018 • Aadirupa Saha, Aditya Gopalan
We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e. g., the most preferred item or ranking of the top $m$ most preferred items etc.
no code implementations • 21 May 2018 • Sayak Ray Chowdhury, Aditya Gopalan
We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.
no code implementations • 13 Jun 2017 • Siddharth Barman, Aditya Gopalan, Aadirupa Saha
We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls.
no code implementations • 23 Apr 2017 • Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features.
no code implementations • ICML 2017 • Sayak Ray Chowdhury, Aditya Gopalan
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.
no code implementations • 30 Nov 2016 • Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus
For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm.
no code implementations • 6 Sep 2016 • Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki
This induces a low-rank structure on the matrix of expected rewards r a, b from recommending item a to user b.
no code implementations • 30 Mar 2016 • Rahul Meshram, Aditya Gopalan, D. Manjunath
We then analyze the performance of the learning algorithm and characterize the regret.
no code implementations • 29 Feb 2016 • Ravi Kumar Kolla, Krishna Jagannathan, Aditya Gopalan
A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret.
no code implementations • 29 Jun 2014 • Aditya Gopalan, Shie Mannor
We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.
no code implementations • 3 Nov 2013 • Aditya Gopalan
In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i. e., the experts setting), studied by Kalai and Vempala, 2005.
no code implementations • 3 Nov 2013 • Aditya Gopalan, Shie Mannor, Yishay Mansour
We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.