Search Results for author: Aditya Gopalan

Found 39 papers, 2 papers with code

Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs

no code implementations13 Oct 2023 Debangshu Banerjee, Aditya Gopalan

Parametric, feature-based reward models are employed by a variety of algorithms in decision-making settings such as bandits and Markov decision processes (MDPs).

Decision Making Multi-Armed Bandits +1

A Unified Framework for Discovering Discrete Symmetries

no code implementations6 Sep 2023 Pavan Karjol, Rohan Kashyap, Aditya Gopalan, Prathosh A. P

At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions invariant to these subgroups in a principled manner.

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

no code implementations9 Jan 2023 Debangshu Banerjee, Aditya Gopalan

As noted in the works of \cite{lattimore2020bandit}, it has been mentioned that it is an open problem to characterize the minimax regret of linear bandits in a wide variety of action spaces.

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

no code implementations23 Jul 2022 Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time.

Clustering Model Selection

Actor-Critic based Improper Reinforcement Learning

no code implementations19 Jul 2022 Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.

reinforcement-learning Reinforcement Learning (RL)

Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint

no code implementations31 Mar 2022 Dipayan Sen, L. A. Prashanth, Aditya Gopalan

We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round.

Bregman Deviations of Generic Exponential Families

no code implementations18 Jan 2022 Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.

On Slowly-varying Non-stationary Bandits

no code implementations25 Oct 2021 Ramakrishnan Krishnamurthy, Aditya Gopalan

We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is essentially minimax optimal.

Bandit Quickest Changepoint Detection

no code implementations NeurIPS 2021 Aditya Gopalan, Venkatesh Saligrama, Braghadeesh Lakshminarayanan

Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns.

Improper Reinforcement Learning with Gradient-based Policy Optimization

no code implementations16 Feb 2021 Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Linear Bandits with Protected Subspace

no code implementations2 Nov 2020 Advait Parulekar, Soumya Basu, Aditya Gopalan, Karthikeyan Shanmugam, Sanjay Shakkottai

We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace.

No-regret Algorithms for Multi-task Bayesian Optimization

no code implementations20 Aug 2020 Sayak Ray Chowdhury, Aditya Gopalan

We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives.

Bayesian Optimization

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

no code implementations13 Jun 2020 Mohammadi Zaki, Avi Mohan, Aditya Gopalan

We study the problem of best arm identification in linearly parameterised multi-armed bandits.

Multi-Armed Bandits

How Reliable are Test Numbers for Revealing the COVID-19 Ground Truth and Applying Interventions?

1 code implementation24 Apr 2020 Aditya Gopalan, Himanshu Tyagi

We use the simulation framework to compare the performance of three testing policies: Random Symptomatic Testing (RST), Contact Tracing (CT), and a new Location Based Testing policy (LBT).

Regret Minimization in Stochastic Contextual Dueling Bandits

no code implementations20 Feb 2020 Aadirupa Saha, Aditya Gopalan

We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets.

Decision Making Information Retrieval +2

Best-item Learning in Random Utility Models with Subset Choices

no code implementations19 Feb 2020 Aadirupa Saha, Aditya Gopalan

We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.

PAC learning

Sequential Mode Estimation with Oracle Queries

no code implementations19 Nov 2019 Dhruti Shah, Tuhinangshu Choudhury, Nikhil Karamchandani, Aditya Gopalan

We consider the problem of adaptively PAC-learning a probability distribution $\mathcal{P}$'s mode by querying an oracle for information about a sequence of i. i. d.

PAC learning

Towards Optimal and Efficient Best Arm Identification in Linear Bandits

no code implementations5 Nov 2019 Mohammadi Zaki, Avinash Mohan, Aditya Gopalan

We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting.

On Online Learning in Kernelized Markov Decision Processes

no code implementations4 Nov 2019 Sayak Ray Chowdhury, Aditya Gopalan

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques.

Thompson Sampling

On Batch Bayesian Optimization

no code implementations4 Nov 2019 Sayak Ray Chowdhury, Aditya Gopalan

We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.

Bayesian Optimization Thompson Sampling

On Adaptivity in Information-constrained Online Learning

no code implementations19 Oct 2019 Siddharth Mitra, Aditya Gopalan

We then consider revealing action-partial monitoring games -- a version of label efficient prediction with additive information costs, which in general are known to lie in the \textit{hard} class of games having minimax regret of order $T^{\frac{2}{3}}$.

Bayesian Optimization under Heavy-tailed Payoffs

1 code implementation NeurIPS 2019 Sayak Ray Chowdhury, Aditya Gopalan

We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel.

Bayesian Optimization

Combinatorial Bandits with Relative Feedback

no code implementations NeurIPS 2019 Aadirupa Saha, Aditya Gopalan

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

no code implementations ICML 2020 Aadirupa Saha, Aditya Gopalan

In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k}{\delta}\Big(\ln \frac{1}{\Delta_i}\Big)\bigg)$, $\Delta_i$ being the Plackett-Luce parameter gap between the best and the $i^{th}$ best item, and $\theta_{[k]}$ is the sum of the \pl\, parameters for the top-$k$ items.

PAC learning

Active Ranking with Subset-wise Preferences

no code implementations23 Oct 2018 Aadirupa Saha, Aditya Gopalan

When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case.

PAC Battling Bandits in the Plackett-Luce Model

no code implementations12 Aug 2018 Aadirupa Saha, Aditya Gopalan

We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e. g., the most preferred item or ranking of the top $m$ most preferred items etc.

Online Learning in Kernelized Markov Decision Processes

no code implementations21 May 2018 Sayak Ray Chowdhury, Aditya Gopalan

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.

Online Learning for Structured Loss Spaces

no code implementations13 Jun 2017 Siddharth Barman, Aditya Gopalan, Aadirupa Saha

We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls.

Misspecified Linear Bandits

no code implementations23 Apr 2017 Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features.

Learning-To-Rank

On Kernelized Multi-armed Bandits

no code implementations ICML 2017 Sayak Ray Chowdhury, Aditya Gopalan

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.

Multi-Armed Bandits

Bandit algorithms to emulate human decision making using probabilistic distortions

no code implementations30 Nov 2016 Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm.

Decision Making Multi-Armed Bandits

Low-rank Bandits with Latent Mixtures

no code implementations6 Sep 2016 Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

This induces a low-rank structure on the matrix of expected rewards r a, b from recommending item a to user b.

Recommendation Systems

Collaborative Learning of Stochastic Bandits over a Social Network

no code implementations29 Feb 2016 Ravi Kumar Kolla, Krishna Jagannathan, Aditya Gopalan

A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret.

Thompson Sampling for Learning Parameterized Markov Decision Processes

no code implementations29 Jun 2014 Aditya Gopalan, Shie Mannor

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.

reinforcement-learning Reinforcement Learning (RL) +1

Thompson Sampling for Complex Bandit Problems

no code implementations3 Nov 2013 Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

Thompson Sampling

Thompson Sampling for Online Learning with Linear Experts

no code implementations3 Nov 2013 Aditya Gopalan

In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i. e., the experts setting), studied by Kalai and Vempala, 2005.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.