no code implementations • 31 Jul 2024 • S. Rasoul Etesami, R. Srikant
Our proposed game-theoretic framework bridges the discrete problem of learning stable matchings with the problem of learning NE in continuous-action games.
no code implementations • 30 May 2024 • Yashaswini Murthy, Isaac Grosof, Siva Theja Maguluri, R. Srikant
We consider policy optimization methods in reinforcement learning settings where the state space is arbitrarily large, or even countably infinite.
no code implementations • 11 Mar 2024 • Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor
We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).
no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant
The algorithm is based on the popular Policy Cover-Policy Gradient (PC-PG) algorithm, which assumes knowledge of the reward function.
no code implementations • 7 Feb 2024 • Isaac Grosof, Siva Theja Maguluri, R. Srikant
In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs.
no code implementations • 28 Jan 2024 • R. Srikant
We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains.
no code implementations • 17 Jan 2024 • Yihan Du, R. Srikant, Wei Chen
In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability.
no code implementations • 19 Sep 2023 • Ameya Anjarlekar, Rasoul Etesami, R. Srikant
We investigate the problem of performing logistic regression on data collected from privacy-sensitive sellers.
no code implementations • 30 May 2023 • Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant
The study of collaborative multi-agent bandits has attracted significant attention recently.
no code implementations • 17 Mar 2023 • Anna Winnicki, R. Srikant
We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2
no code implementations • 14 Feb 2023 • Saptarshi Mandal, Seo Taek Kong, Dimitrios Katselis, R. Srikant
The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses.
no code implementations • 8 Feb 2023 • Yashaswini Murthy, Mehrdad Moharrami, R. Srikant
Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk sensitive value and policy iteration approaches.
no code implementations • 23 Jan 2023 • Anna Winnicki, R. Srikant
A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function.
no code implementations • 13 Oct 2022 • Anna Winnicki, R. Srikant
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent.
no code implementations • 2 Sep 2022 • Zixian Yang, R. Srikant, Lei Ying
We prove that under our algorithm the asymptotic average queue length is bounded by one divided by the traffic slackness, which is order-wise optimal.
no code implementations • 2 Jun 2022 • Semih Cayci, Niao He, R. Srikant
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.
no code implementations • 23 Mar 2022 • Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant
Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.
no code implementations • 28 Feb 2022 • Daniel Vial, Sanjay Shakkottai, R. Srikant
Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).
no code implementations • 20 Feb 2022 • Semih Cayci, Niao He, R. Srikant
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.
no code implementations • 8 Feb 2022 • Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant
We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies.
no code implementations • 28 Sep 2021 • Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant
Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.
no code implementations • 12 Sep 2021 • Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant
(P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance.
no code implementations • 8 Jun 2021 • Semih Cayci, Niao He, R. Srikant
Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.
no code implementations • 4 May 2021 • Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant
We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP).
no code implementations • 24 Apr 2021 • Shiyu Liang, Ruoyu Sun, R. Srikant
Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.
no code implementations • 2 Mar 2021 • Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant
In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.
no code implementations • 29 Jan 2021 • Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant
We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.
no code implementations • 4 Dec 2020 • Daniel Vial, Sanjay Shakkottai, R. Srikant
We consider a variant of the traditional multi-armed bandit problem in which each arm is only able to provide one-bit feedback during each pull based on its past history of rewards.
1 code implementation • 17 Nov 2020 • Joseph Lubars, Harsh Gupta, Sandeep Chinchali, Liyun Li, Adnan Raja, R. Srikant, Xinzhou Wu
We consider the problem of designing an algorithm to allow a car to autonomously merge on to a highway from an on-ramp.
no code implementations • 17 Oct 2020 • Xiaotian Xie, Dimitrios Katselis, Carolyn L. Beck, R. Srikant
Incoming edges to a node in the graph indicate that the state of the node at a particular time instant is influenced by the states of the corresponding parental nodes in the previous time instant.
no code implementations • 14 Sep 2020 • Arghyadip Roy, Sanjay Shakkottai, R. Srikant
rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i. i. d.
1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.
no code implementations • 7 Jul 2020 • Daniel Vial, Sanjay Shakkottai, R. Srikant
Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret.
no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant
Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.
no code implementations • 30 Jun 2020 • Semih Cayci, Atilla Eryilmaz, R. Srikant
Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science.
no code implementations • 29 Feb 2020 • Semih Cayci, Atilla Eryilmaz, R. Srikant
We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.
no code implementations • 31 Dec 2019 • Shiyu Liang, Ruoyu Sun, R. Srikant
More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.
1 code implementation • NeurIPS 2019 • Harsh Gupta, R. Srikant, Lei Ying
We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC.
no code implementations • 3 Feb 2019 • R. Srikant, Lei Ying
We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i. e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE).
no code implementations • 25 Jan 2019 • Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang
In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+\alpha} T)$ regret for a stochastic MAB problem with $K$ arms, where $\alpha>0$ is a parameter of the algorithm.
no code implementations • NeurIPS 2018 • Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant
One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.
1 code implementation • 10 Apr 2018 • Siddhartha Satpathi, Supratim Deb, R. Srikant, He Yan
One of the main contributions of the paper is a novel mapping of our problem which transforms it into a problem of topic discovery in documents.
no code implementations • ICML 2018 • Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant
Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.
8 code implementations • ICLR 2018 • Shiyu Liang, Yixuan Li, R. Srikant
We show in a series of experiments that ODIN is compatible with diverse network architectures and datasets.
no code implementations • 19 Dec 2016 • Dimitrios Katselis, Carolyn L. Beck, R. Srikant
For a network with $p$ nodes, where each node has in-degree at most $d$ and corresponds to a scalar Bernoulli process generated by a BAR, we provide a greedy algorithm that can efficiently learn the structure of the underlying directed graph with a sample complexity proportional to the mixing time of the BAR process.
no code implementations • 13 Oct 2016 • Shiyu Liang, R. Srikant
We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation.
no code implementations • 9 Jun 2016 • Kobi Cohen, Angelia Nedic, R. Srikant
The problem of least squares regression of a $d$-dimensional unknown parameter is considered.
no code implementations • NeurIPS 2015 • Huasen Wu, R. Srikant, Xin Liu, Chong Jiang
To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits.
no code implementations • 16 Feb 2015 • Rui Wu, Jiaming Xu, R. Srikant, Laurent Massoulié, Marc Lelarge, Bruce Hajek
We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$.
no code implementations • 6 Mar 2014 • Kai Zhu, Rui Wu, Lei Ying, R. Srikant
In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users).
no code implementations • 1 Oct 2013 • Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying
In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure.
no code implementations • 25 Apr 2012 • Rui Wu, R. Srikant, Jian Ni
We consider the structure learning problem for graphical models that we call loosely connected Markov random fields, in which the number of short paths between any pair of nodes is small, and present a new conditional independence test based algorithm for learning the underlying graph structure.