Search Results for author: Aditya Gopalan

Found 39 papers, 2 papers with code

Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs

no code implementations • 13 Oct 2023 • Debangshu Banerjee, Aditya Gopalan

Parametric, feature-based reward models are employed by a variety of algorithms in decision-making settings such as bandits and Markov decision processes (MDPs).

Decision Making Multi-Armed Bandits +1

Paper
Add Code

A Unified Framework for Discovering Discrete Symmetries

no code implementations • 6 Sep 2023 • Pavan Karjol, Rohan Kashyap, Aditya Gopalan, Prathosh A. P

At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions invariant to these subgroups in a principled manner.

Paper
Add Code

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

no code implementations • 9 Jan 2023 • Debangshu Banerjee, Aditya Gopalan

As noted in the works of \cite{lattimore2020bandit}, it has been mentioned that it is an open problem to characterize the minimax regret of linear bandits in a wide variety of action spaces.

Paper
Add Code

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

no code implementations • 23 Jul 2022 • Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.

Clustering Model Selection

Paper
Add Code

Actor-Critic based Improper Reinforcement Learning

no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Demystifying Approximate Value-based RL with $ε$-greedy Exploration: A Differential Inclusion View

no code implementations • 26 May 2022 • Aditya Gopalan, Gugan Thoppe

Q-learning and SARSA with $\epsilon$-greedy exploration are leading reinforcement learning methods.

Q-Learning reinforcement-learning +1

Paper
Add Code

Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint

no code implementations • 31 Mar 2022 • Dipayan Sen, L. A. Prashanth, Aditya Gopalan

We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round.

Paper
Add Code

Bregman Deviations of Generic Exponential Families

no code implementations • 18 Jan 2022 • Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.

Paper
Add Code

On Slowly-varying Non-stationary Bandits

no code implementations • 25 Oct 2021 • Ramakrishnan Krishnamurthy, Aditya Gopalan

We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is essentially minimax optimal.

Paper
Add Code

Bandit Quickest Changepoint Detection

no code implementations • NeurIPS 2021 • Aditya Gopalan, Venkatesh Saligrama, Braghadeesh Lakshminarayanan

Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns.

Paper
Add Code

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

no code implementations • 1 May 2021 • Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Improper Reinforcement Learning with Gradient-based Policy Optimization

no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Stochastic Linear Bandits with Protected Subspace

no code implementations • 2 Nov 2020 • Advait Parulekar, Soumya Basu, Aditya Gopalan, Karthikeyan Shanmugam, Sanjay Shakkottai

We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace.

Paper
Add Code

No-regret Algorithms for Multi-task Bayesian Optimization

no code implementations • 20 Aug 2020 • Sayak Ray Chowdhury, Aditya Gopalan

We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives.

Bayesian Optimization

Paper
Add Code

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

no code implementations • 13 Jun 2020 • Mohammadi Zaki, Avi Mohan, Aditya Gopalan

We study the problem of best arm identification in linearly parameterised multi-armed bandits.

Multi-Armed Bandits

Paper
Add Code

How Reliable are Test Numbers for Revealing the COVID-19 Ground Truth and Applying Interventions?

1 code implementation • 24 Apr 2020 • Aditya Gopalan, Himanshu Tyagi

We use the simulation framework to compare the performance of three testing policies: Random Symptomatic Testing (RST), Contact Tracing (CT), and a new Location Based Testing policy (LBT).

Paper
Code

Regret Minimization in Stochastic Contextual Dueling Bandits

no code implementations • 20 Feb 2020 • Aadirupa Saha, Aditya Gopalan

We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets.

Decision Making Information Retrieval +2

Paper
Add Code

Best-item Learning in Random Utility Models with Subset Choices

no code implementations • 19 Feb 2020 • Aadirupa Saha, Aditya Gopalan

We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.

PAC learning

Paper
Add Code

Sequential Mode Estimation with Oracle Queries

no code implementations • 19 Nov 2019 • Dhruti Shah, Tuhinangshu Choudhury, Nikhil Karamchandani, Aditya Gopalan

We consider the problem of adaptively PAC-learning a probability distribution $\mathcal{P}$'s mode by querying an oracle for information about a sequence of i. i. d.

PAC learning

Paper
Add Code

Towards Optimal and Efficient Best Arm Identification in Linear Bandits

no code implementations • 5 Nov 2019 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan

We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting.

Paper
Add Code

On Online Learning in Kernelized Markov Decision Processes

no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques.

Thompson Sampling

Paper
Add Code

On Batch Bayesian Optimization

no code implementations • 4 Nov 2019 • Sayak Ray Chowdhury, Aditya Gopalan

We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.

Bayesian Optimization Thompson Sampling

Paper
Add Code

On Adaptivity in Information-constrained Online Learning

no code implementations • 19 Oct 2019 • Siddharth Mitra, Aditya Gopalan

We then consider revealing action-partial monitoring games -- a version of label efficient prediction with additive information costs, which in general are known to lie in the \textit{hard} class of games having minimax regret of order $T^{\frac{2}{3}}$.

Paper
Add Code

Bayesian Optimization under Heavy-tailed Payoffs

1 code implementation • NeurIPS 2019 • Sayak Ray Chowdhury, Aditya Gopalan

We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel.

Bayesian Optimization

Paper
Code

Combinatorial Bandits with Relative Feedback

no code implementations • NeurIPS 2019 • Aadirupa Saha, Aditya Gopalan

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.

Paper
Add Code

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

no code implementations • ICML 2020 • Aadirupa Saha, Aditya Gopalan

In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl\, parameters for the top-$k$ items.

PAC learning

Paper
Add Code

Active Ranking with Subset-wise Preferences

no code implementations • 23 Oct 2018 • Aadirupa Saha, Aditya Gopalan

When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case.

Paper
Add Code

PAC Battling Bandits in the Plackett-Luce Model

no code implementations • 12 Aug 2018 • Aadirupa Saha, Aditya Gopalan

We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e. g., the most preferred item or ranking of the top $m$ most preferred items etc.

Paper
Add Code

Online Learning in Kernelized Markov Decision Processes

no code implementations • 21 May 2018 • Sayak Ray Chowdhury, Aditya Gopalan

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.

Paper
Add Code

Online Learning for Structured Loss Spaces

no code implementations • 13 Jun 2017 • Siddharth Barman, Aditya Gopalan, Aadirupa Saha

We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls.

Paper
Add Code

Misspecified Linear Bandits

no code implementations • 23 Apr 2017 • Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features.

Learning-To-Rank

Paper
Add Code

On Kernelized Multi-armed Bandits

no code implementations • ICML 2017 • Sayak Ray Chowdhury, Aditya Gopalan

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.

Multi-Armed Bandits

Paper
Add Code

Bandit algorithms to emulate human decision making using probabilistic distortions

no code implementations • 30 Nov 2016 • Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm.

Decision Making Multi-Armed Bandits

Paper
Add Code

Low-rank Bandits with Latent Mixtures

no code implementations • 6 Sep 2016 • Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

This induces a low-rank structure on the matrix of expected rewards r a, b from recommending item a to user b.

Recommendation Systems

Paper
Add Code

Optimal Recommendation to Users that React: Online Learning for a Class of POMDPs

no code implementations • 30 Mar 2016 • Rahul Meshram, Aditya Gopalan, D. Manjunath

We then analyze the performance of the learning algorithm and characterize the regret.

Recommendation Systems Thompson Sampling

Paper
Add Code

Collaborative Learning of Stochastic Bandits over a Social Network

no code implementations • 29 Feb 2016 • Ravi Kumar Kolla, Krishna Jagannathan, Aditya Gopalan

A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret.

Paper
Add Code

Thompson Sampling for Learning Parameterized Markov Decision Processes

no code implementations • 29 Jun 2014 • Aditya Gopalan, Shie Mannor

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Thompson Sampling for Complex Bandit Problems

no code implementations • 3 Nov 2013 • Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

Thompson Sampling

Paper
Add Code

Thompson Sampling for Online Learning with Linear Experts

no code implementations • 3 Nov 2013 • Aditya Gopalan

In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i. e., the experts setting), studied by Kalai and Vempala, 2005.

Thompson Sampling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.