no code implementations • 1 Nov 2022 • Nouman Khan, Mehrdad Moharrami, Vijay Subramanian
In this work, we propose a tunable piece-selection policy that minimizes this (undesirable) requisite by combining the (work-conserving but not stabilizing) rarest-first protocol with only an appropriate share of the (non-work conserving and stabilizing) mode-suppression protocol.
1 code implementation • 6 Mar 2022 • Rayan El Helou, Kiyeob Lee, Dongqi Wu, Le Xie, Srinivas Shakkottai, Vijay Subramanian
This paper presents OpenGridGym, an open-source Python-based package that allows for seamless integration of distribution market simulation with state-of-the-art artificial intelligence (AI) decision-making algorithms.
no code implementations • 4 Feb 2022 • Saghar Adler, Mehrdad Moharrami, Vijay Subramanian
In our problem, certainty equivalent control switches between an always admit policy (always explore) and a never admit policy (immediately terminate learning), which is distinct from the adaptive control literature.
no code implementations • 1 Nov 2021 • Hsu Kao, Chen-Yu Wei, Vijay Subramanian
For the bandit setting, we propose a hierarchical bandit algorithm that achieves a near-optimal gap-independent regret of $\widetilde{\mathcal{O}}(\sqrt{ABT})$ and a near-optimal gap-dependent regret of $\mathcal{O}(\log(T))$, where $A$ and $B$ are the numbers of actions of the leader and the follower, respectively, and $T$ is the number of steps.
no code implementations • 25 Oct 2021 • Hsu Kao, Vijay Subramanian
Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 18 Feb 2020 • Daniel Vial, Vijay Subramanian
We devise and analyze algorithms for the empirical policy evaluation problem in reinforcement learning.
1 code implementation • 4 Jun 2017 • Daniel Vial, Vijay Subramanian
We then show that the common underlying graph can be leveraged to efficiently and jointly estimate PPR for many pairs, rather than treating each pair separately using the primitive algorithm.
Social and Information Networks