no code implementations • 15 Jun 2023 • Abishek Sankararaman, Balakrishnan, Narayanaswamy
We derive guarantees on worst-case, finite-sample false-positive rate (FPR) over the family of all distributions with bounded second moment.
no code implementations • 2 Dec 2022 • Kaustubh Sridhar, Vikramank Singh, Balakrishnan Narayanaswamy, Abishek Sankararaman
PnC jointly trains a prediction model and a terminal Q function that approximates cost-to-go over a long horizon, by back-propagating the cost of decisions through the optimization problem \emph{and from the future}.
no code implementations • 31 May 2022 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran, Tara Javidi, Arya Mazumdar
We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms.
no code implementations • 19 May 2022 • Avishek Ghosh, Abishek Sankararaman
The (poly) logarithmic regret of \texttt{LR-SCB} stems from two crucial facts: (a) the application of a norm adaptive algorithm to exploit the parameter estimation and (b) an analysis of the shifted linear contextual bandit algorithm, showing that shifting results in increasing regret.
no code implementations • 7 Jul 2021 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption.
no code implementations • 15 Jun 2021 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
We show that, for any agent, the regret scales as $\mathcal{O}(\sqrt{T/N})$, if the agent is in a `well separated' cluster, or scales as $\mathcal{O}(T^{\frac{1}{2} + \varepsilon}/(N)^{\frac{1}{2} -\varepsilon})$ if its cluster is not well separated, where $\varepsilon$ is positive and arbitrarily close to $0$.
no code implementations • 12 Mar 2021 • Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman
We design decentralized algorithms for regret minimization in the two-sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al. 2020a, 2020b, Sankararaman et al. 2020).
no code implementations • 2 Jul 2020 • Ronshee Chawla, Abishek Sankararaman, Sanjay Shakkottai
We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector $\theta^* \in \mathbb{R}^d$.
no code implementations • 26 Jun 2020 • Abishek Sankararaman, Soumya Basu, Karthik Abinav Sankararaman
Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e. g. UpWork, TaskRabbit).
no code implementations • 4 Jun 2020 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
This is the first algorithm that achieves such model selection guarantees.
no code implementations • 15 Jan 2020 • Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai
Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play.
no code implementations • 27 Nov 2019 • Abishek Sankararaman, Haris Vikalo, François Baccelli
Results: We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph.
no code implementations • 4 Oct 2019 • Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai
Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents.