Search Results for author: R. Srikant

Found 44 papers, 5 papers with code

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

no code implementations17 Mar 2023 Anna Winnicki, R. Srikant

We further show that lookahead can be implemented efficiently in linear Markov games, which are the counterpart of the linear MDPs and have been the subject of much attention recently.

Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2

A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

no code implementations14 Feb 2023 Seo Taek Kong, Saptarshi Mandal, Dimitrios Katselis, R. Srikant

After separating tasks by type, any Dawid-Skene algorithm (i. e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values.

Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs

no code implementations8 Feb 2023 Yashaswini Murthy, Mehrdad Moharrami, R. Srikant

Although policy iteration and value iteration have been well studied in the context of risk sensitive MDPs, modified policy iteration is relatively unexplored.

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

no code implementations2 Feb 2023 Yashaswini Murthy, Mehrdad Moharrami, R. Srikant

Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i. e., where policy improvement and policy evaluation are both performed approximately.

reinforcement-learning Reinforcement Learning (RL)

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

no code implementations23 Jan 2023 Anna Winnicki, R. Srikant

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function.

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

no code implementations13 Oct 2022 Anna Winnicki, R. Srikant

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent.

reinforcement-learning Reinforcement Learning (RL)

MaxWeight With Discounted UCB: A Provably Stable Scheduling Policy for Nonstationary Multi-Server Systems With Unknown Statistics

no code implementations2 Sep 2022 Zixian Yang, R. Srikant, Lei Ying

Simulation results confirm that the proposed algorithm can stabilize the queues and that it outperforms MaxWeight with empirical mean and MaxWeight with discounted empirical mean.


Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

no code implementations2 Jun 2022 Semih Cayci, Niao He, R. Srikant

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.

Minimax Regret for Cascading Bandits

no code implementations23 Mar 2022 Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.


Robust Multi-Agent Bandits Over Undirected Graphs

no code implementations28 Feb 2022 Daniel Vial, Sanjay Shakkottai, R. Srikant

Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

Learning to Control Partially Observed Systems with Finite Memory

no code implementations20 Feb 2022 Semih Cayci, Niao He, R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

no code implementations8 Feb 2022 Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant

We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies.

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

no code implementations28 Sep 2021 Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant

Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.

Improved Algorithms for Misspecified Linear Markov Decision Processes

no code implementations12 Sep 2021 Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

(P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance.

Multi-Armed Bandits

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

no code implementations8 Jun 2021 Semih Cayci, Niao He, R. Srikant

Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

no code implementations4 May 2021 Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP).

Achieving Small Test Error in Mildly Overparameterized Neural Networks

no code implementations24 Apr 2021 Shiyu Liang, Ruoyu Sun, R. Srikant

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

no code implementations2 Mar 2021 Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

no code implementations29 Jan 2021 Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant

We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.

One-bit feedback is sufficient for upper confidence bound policies

no code implementations4 Dec 2020 Daniel Vial, Sanjay Shakkottai, R. Srikant

We consider a variant of the traditional multi-armed bandit problem in which each arm is only able to provide one-bit feedback during each pull based on its past history of rewards.

On the Consistency of Maximum Likelihood Estimators for Causal Network Identification

no code implementations17 Oct 2020 Xiaotian Xie, Dimitrios Katselis, Carolyn L. Beck, R. Srikant

Incoming edges to a node in the graph indicate that the state of the node at a particular time instant is influenced by the states of the corresponding parental nodes in the previous time instant.

Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

no code implementations14 Sep 2020 Arghyadip Roy, Sanjay Shakkottai, R. Srikant

rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i. i. d.

The Mean-Squared Error of Double Q-Learning

1 code implementation NeurIPS 2020 Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.


Robust Multi-Agent Multi-Armed Bandits

no code implementations7 Jul 2020 Daniel Vial, Sanjay Shakkottai, R. Srikant

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret.

Distributed Computing Multi-Armed Bandits +1

The Global Landscape of Neural Networks: An Overview

no code implementations2 Jul 2020 Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

Continuous-Time Multi-Armed Bandits with Controlled Restarts

no code implementations30 Jun 2020 Semih Cayci, Atilla Eryilmaz, R. Srikant

Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science.

Multi-Armed Bandits

Budget-Constrained Bandits over General Cost and Reward Distributions

no code implementations29 Feb 2020 Semih Cayci, Atilla Eryilmaz, R. Srikant

We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

no code implementations31 Dec 2019 Shiyu Liang, Ruoyu Sun, R. Srikant

More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

1 code implementation NeurIPS 2019 Harsh Gupta, R. Srikant, Lei Ying

We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC.

reinforcement-learning Reinforcement Learning (RL)

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

no code implementations3 Feb 2019 R. Srikant, Lei Ying

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i. e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE).

Almost Boltzmann Exploration

no code implementations25 Jan 2019 Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang

In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+\alpha} T)$ regret for a stochastic MAB problem with $K$ arms, where $\alpha>0$ is a parameter of the algorithm.

Multi-Armed Bandits

Adding One Neuron Can Eliminate All Bad Local Minima

no code implementations NeurIPS 2018 Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.

General Classification

Learning Latent Events from Network Message Logs

1 code implementation10 Apr 2018 Siddhartha Satpathi, Supratim Deb, R. Srikant, He Yan

One of the main contributions of the paper is a novel mapping of our problem which transforms it into a problem of topic discovery in documents.

Change Point Detection

Understanding the Loss Surface of Neural Networks for Binary Classification

no code implementations ICML 2018 Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant

Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.

Classification General Classification

Mixing Times and Structural Inference for Bernoulli Autoregressive Processes

no code implementations19 Dec 2016 Dimitrios Katselis, Carolyn L. Beck, R. Srikant

For a network with $p$ nodes, where each node has in-degree at most $d$ and corresponds to a scalar Bernoulli process generated by a BAR, we provide a greedy algorithm that can efficiently learn the structure of the underlying directed graph with a sample complexity proportional to the mixing time of the BAR process.

Time Series Analysis

Why Deep Neural Networks for Function Approximation?

no code implementations13 Oct 2016 Shiyu Liang, R. Srikant

We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation.

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

no code implementations NeurIPS 2015 Huasen Wu, R. Srikant, Xin Liu, Chong Jiang

To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits.

Multi-Armed Bandits

Clustering and Inference From Pairwise Comparisons

no code implementations16 Feb 2015 Rui Wu, Jiaming Xu, R. Srikant, Laurent Massoulié, Marc Lelarge, Bruce Hajek

We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$.

Collaborative Filtering with Information-Rich and Information-Sparse Entities

no code implementations6 Mar 2014 Kai Zhu, Rui Wu, Lei Ying, R. Srikant

In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users).

Collaborative Filtering Recommendation Systems

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

no code implementations1 Oct 2013 Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure.

Learning Loosely Connected Markov Random Fields

no code implementations25 Apr 2012 Rui Wu, R. Srikant, Jian Ni

We consider the structure learning problem for graphical models that we call loosely connected Markov random fields, in which the number of short paths between any pair of nodes is small, and present a new conditional independence test based algorithm for learning the underlying graph structure.

Cannot find the paper you are looking for? You can Submit a new open access paper.