Search Results for author: R. Srikant

Found 50 papers, 5 papers with code

Learning Loosely Connected Markov Random Fields

no code implementations25 Apr 2012 Rui Wu, R. Srikant, Jian Ni

We consider the structure learning problem for graphical models that we call loosely connected Markov random fields, in which the number of short paths between any pair of nodes is small, and present a new conditional independence test based algorithm for learning the underlying graph structure.

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

no code implementations1 Oct 2013 Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure.

Clustering

Collaborative Filtering with Information-Rich and Information-Sparse Entities

no code implementations6 Mar 2014 Kai Zhu, Rui Wu, Lei Ying, R. Srikant

In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users).

Clustering Collaborative Filtering +1

Clustering and Inference From Pairwise Comparisons

no code implementations16 Feb 2015 Rui Wu, Jiaming Xu, R. Srikant, Laurent Massoulié, Marc Lelarge, Bruce Hajek

We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$.

Clustering

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

no code implementations NeurIPS 2015 Huasen Wu, R. Srikant, Xin Liu, Chong Jiang

To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits.

Multi-Armed Bandits

Why Deep Neural Networks for Function Approximation?

no code implementations13 Oct 2016 Shiyu Liang, R. Srikant

We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation.

Mixing Times and Structural Inference for Bernoulli Autoregressive Processes

no code implementations19 Dec 2016 Dimitrios Katselis, Carolyn L. Beck, R. Srikant

For a network with $p$ nodes, where each node has in-degree at most $d$ and corresponds to a scalar Bernoulli process generated by a BAR, we provide a greedy algorithm that can efficiently learn the structure of the underlying directed graph with a sample complexity proportional to the mixing time of the BAR process.

Time Series Analysis

Understanding the Loss Surface of Neural Networks for Binary Classification

no code implementations ICML 2018 Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant

Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.

Binary Classification Classification +1

Learning Latent Events from Network Message Logs

1 code implementation10 Apr 2018 Siddhartha Satpathi, Supratim Deb, R. Srikant, He Yan

One of the main contributions of the paper is a novel mapping of our problem which transforms it into a problem of topic discovery in documents.

Change Point Detection

Adding One Neuron Can Eliminate All Bad Local Minima

no code implementations NeurIPS 2018 Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.

Binary Classification General Classification

Almost Boltzmann Exploration

no code implementations25 Jan 2019 Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang

In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+\alpha} T)$ regret for a stochastic MAB problem with $K$ arms, where $\alpha>0$ is a parameter of the algorithm.

Multi-Armed Bandits

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

no code implementations3 Feb 2019 R. Srikant, Lei Ying

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i. e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE).

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

1 code implementation NeurIPS 2019 Harsh Gupta, R. Srikant, Lei Ying

We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

no code implementations31 Dec 2019 Shiyu Liang, Ruoyu Sun, R. Srikant

More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.

Budget-Constrained Bandits over General Cost and Reward Distributions

no code implementations29 Feb 2020 Semih Cayci, Atilla Eryilmaz, R. Srikant

We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Continuous-Time Multi-Armed Bandits with Controlled Restarts

no code implementations30 Jun 2020 Semih Cayci, Atilla Eryilmaz, R. Srikant

Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science.

Multi-Armed Bandits

The Global Landscape of Neural Networks: An Overview

no code implementations2 Jul 2020 Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

Robust Multi-Agent Multi-Armed Bandits

no code implementations7 Jul 2020 Daniel Vial, Sanjay Shakkottai, R. Srikant

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret.

Distributed Computing Multi-Armed Bandits +1

The Mean-Squared Error of Double Q-Learning

1 code implementation NeurIPS 2020 Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.

Q-Learning

Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

no code implementations14 Sep 2020 Arghyadip Roy, Sanjay Shakkottai, R. Srikant

rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i. i. d.

On the Consistency of Maximum Likelihood Estimators for Causal Network Identification

no code implementations17 Oct 2020 Xiaotian Xie, Dimitrios Katselis, Carolyn L. Beck, R. Srikant

Incoming edges to a node in the graph indicate that the state of the node at a particular time instant is influenced by the states of the corresponding parental nodes in the previous time instant.

One-bit feedback is sufficient for upper confidence bound policies

no code implementations4 Dec 2020 Daniel Vial, Sanjay Shakkottai, R. Srikant

We consider a variant of the traditional multi-armed bandit problem in which each arm is only able to provide one-bit feedback during each pull based on its past history of rewards.

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

no code implementations29 Jan 2021 Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant

We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

no code implementations2 Mar 2021 Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.

Achieving Small Test Error in Mildly Overparameterized Neural Networks

no code implementations24 Apr 2021 Shiyu Liang, Ruoyu Sun, R. Srikant

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.

Binary Classification

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

no code implementations4 May 2021 Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP).

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

no code implementations8 Jun 2021 Semih Cayci, Niao He, R. Srikant

Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

Improved Algorithms for Misspecified Linear Markov Decision Processes

no code implementations12 Sep 2021 Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

(P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance.

Multi-Armed Bandits

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

no code implementations28 Sep 2021 Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant

Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

no code implementations8 Feb 2022 Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant

We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies.

Finite-Time Analysis of Natural Actor-Critic for POMDPs

no code implementations20 Feb 2022 Semih Cayci, Niao He, R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.

Robust Multi-Agent Bandits Over Undirected Graphs

no code implementations28 Feb 2022 Daniel Vial, Sanjay Shakkottai, R. Srikant

Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

Minimax Regret for Cascading Bandits

no code implementations23 Mar 2022 Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.

Learning-To-Rank

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

no code implementations2 Jun 2022 Semih Cayci, Niao He, R. Srikant

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.

Learning While Scheduling in Multi-Server Systems with Unknown Statistics: MaxWeight with Discounted UCB

no code implementations2 Sep 2022 Zixian Yang, R. Srikant, Lei Ying

We prove that under our algorithm the asymptotic average queue length is bounded by one divided by the traffic slackness, which is order-wise optimal.

Scheduling

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

no code implementations13 Oct 2022 Anna Winnicki, R. Srikant

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent.

reinforcement-learning Reinforcement Learning (RL)

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

no code implementations23 Jan 2023 Anna Winnicki, R. Srikant

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function.

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

no code implementations8 Feb 2023 Yashaswini Murthy, Mehrdad Moharrami, R. Srikant

Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk sensitive value and policy iteration approaches.

Computational Efficiency

A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

no code implementations14 Feb 2023 Seo Taek Kong, Saptarshi Mandal, Dimitrios Katselis, R. Srikant

After separating tasks by type, any Dawid-Skene algorithm (i. e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values.

Vocal Bursts Type Prediction

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

no code implementations17 Mar 2023 Anna Winnicki, R. Srikant

We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs.

Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2

Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

no code implementations30 May 2023 Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant

The study of collaborative multi-agent bandits has attracted significant attention recently.

Multi-Armed Bandits

Cascading Reinforcement Learning

no code implementations17 Jan 2024 Yihan Du, R. Srikant, Wei Chen

In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability.

Recommendation Systems reinforcement-learning

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

no code implementations28 Jan 2024 R. Srikant

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains.

Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes

no code implementations7 Feb 2024 Isaac Grosof, Siva Theja Maguluri, R. Srikant

In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs.

Reinforcement Learning (RL)

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations15 Feb 2024 Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

no code implementations11 Mar 2024 Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).

Cannot find the paper you are looking for? You can Submit a new open access paper.