Search Results for author: R. Srikant

Found 50 papers, 5 papers with code

Learning Loosely Connected Markov Random Fields

no code implementations • 25 Apr 2012 • Rui Wu, R. Srikant, Jian Ni

We consider the structure learning problem for graphical models that we call loosely connected Markov random fields, in which the number of short paths between any pair of nodes is small, and present a new conditional independence test based algorithm for learning the underlying graph structure.

Paper
Add Code

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

no code implementations • 1 Oct 2013 • Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure.

Clustering

Paper
Add Code

Collaborative Filtering with Information-Rich and Information-Sparse Entities

no code implementations • 6 Mar 2014 • Kai Zhu, Rui Wu, Lei Ying, R. Srikant

In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users).

Clustering Collaborative Filtering +1

Paper
Add Code

Clustering and Inference From Pairwise Comparisons

no code implementations • 16 Feb 2015 • Rui Wu, Jiaming Xu, R. Srikant, Laurent Massoulié, Marc Lelarge, Bruce Hajek

We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$.

Clustering

Paper
Add Code

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

no code implementations • NeurIPS 2015 • Huasen Wu, R. Srikant, Xin Liu, Chong Jiang

To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits.

Multi-Armed Bandits

Paper
Add Code

On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression

no code implementations • 9 Jun 2016 • Kobi Cohen, Angelia Nedic, R. Srikant

The problem of least squares regression of a $d$-dimensional unknown parameter is considered.

regression

Paper
Add Code

Why Deep Neural Networks for Function Approximation?

no code implementations • 13 Oct 2016 • Shiyu Liang, R. Srikant

We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation.

Paper
Add Code

Mixing Times and Structural Inference for Bernoulli Autoregressive Processes

no code implementations • 19 Dec 2016 • Dimitrios Katselis, Carolyn L. Beck, R. Srikant

For a network with $p$ nodes, where each node has in-degree at most $d$ and corresponds to a scalar Bernoulli process generated by a BAR, we provide a greedy algorithm that can efficiently learn the structure of the underlying directed graph with a sample complexity proportional to the mixing time of the BAR process.

Time Series Analysis

Paper
Add Code

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

8 code implementations • ICLR 2018 • Shiyu Liang, Yixuan Li, R. Srikant

We show in a series of experiments that ODIN is compatible with diverse network architectures and datasets.

Ranked #3 on Out-of-Distribution Detection on ImageNet dogs vs ImageNet non-dogs

Out-of-Distribution Detection

514

Paper
Code

Understanding the Loss Surface of Neural Networks for Binary Classification

no code implementations • ICML 2018 • Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant

Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.

Binary Classification Classification +1

Paper
Add Code

Learning Latent Events from Network Message Logs

1 code implementation • 10 Apr 2018 • Siddhartha Satpathi, Supratim Deb, R. Srikant, He Yan

One of the main contributions of the paper is a novel mapping of our problem which transforms it into a problem of topic discovery in documents.

Change Point Detection

Paper
Code

Adding One Neuron Can Eliminate All Bad Local Minima

no code implementations • NeurIPS 2018 • Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.

Binary Classification General Classification

Paper
Add Code

Almost Boltzmann Exploration

no code implementations • 25 Jan 2019 • Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang

In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+\alpha} T)$ regret for a stochastic MAB problem with $K$ arms, where $\alpha>0$ is a parameter of the algorithm.

Multi-Armed Bandits

Paper
Add Code

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

no code implementations • 3 Feb 2019 • R. Srikant, Lei Ying

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i. e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE).

Paper
Add Code

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

1 code implementation • NeurIPS 2019 • Harsh Gupta, R. Srikant, Lei Ying

We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

no code implementations • 31 Dec 2019 • Shiyu Liang, Ruoyu Sun, R. Srikant

More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.

Paper
Add Code

Budget-Constrained Bandits over General Cost and Reward Distributions

no code implementations • 29 Feb 2020 • Semih Cayci, Atilla Eryilmaz, R. Srikant

We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Paper
Add Code

Continuous-Time Multi-Armed Bandits with Controlled Restarts

no code implementations • 30 Jun 2020 • Semih Cayci, Atilla Eryilmaz, R. Srikant

Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science.

Multi-Armed Bandits

Paper
Add Code

The Global Landscape of Neural Networks: An Overview

no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

Paper
Add Code

Robust Multi-Agent Multi-Armed Bandits

no code implementations • 7 Jul 2020 • Daniel Vial, Sanjay Shakkottai, R. Srikant

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret.

Distributed Computing Multi-Armed Bandits +1

Paper
Add Code

The Mean-Squared Error of Double Q-Learning

1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.

Q-Learning

Paper
Code

Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

no code implementations • 14 Sep 2020 • Arghyadip Roy, Sanjay Shakkottai, R. Srikant

rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i. i. d.

Paper
Add Code

On the Consistency of Maximum Likelihood Estimators for Causal Network Identification

no code implementations • 17 Oct 2020 • Xiaotian Xie, Dimitrios Katselis, Carolyn L. Beck, R. Srikant

Incoming edges to a node in the graph indicate that the state of the node at a particular time instant is influenced by the states of the corresponding parental nodes in the previous time instant.

Paper
Add Code

Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

1 code implementation • 17 Nov 2020 • Joseph Lubars, Harsh Gupta, Sandeep Chinchali, Liyun Li, Adnan Raja, R. Srikant, Xinzhou Wu

We consider the problem of designing an algorithm to allow a car to autonomously merge on to a highway from an on-ramp.

Autonomous Driving Model Predictive Control +3

Paper
Code

One-bit feedback is sufficient for upper confidence bound policies

no code implementations • 4 Dec 2020 • Daniel Vial, Sanjay Shakkottai, R. Srikant

We consider a variant of the traditional multi-armed bandit problem in which each arm is only able to provide one-bit feedback during each pull based on its past history of rewards.

Paper
Add Code

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

no code implementations • 29 Jan 2021 • Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant

We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.

Paper
Add Code

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

no code implementations • 2 Mar 2021 • Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.

Paper
Add Code

Achieving Small Test Error in Mildly Overparameterized Neural Networks

no code implementations • 24 Apr 2021 • Shiyu Liang, Ruoyu Sun, R. Srikant

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.

Binary Classification

Paper
Add Code

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

no code implementations • 4 May 2021 • Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP).

Paper
Add Code

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

no code implementations • 8 Jun 2021 • Semih Cayci, Niao He, R. Srikant

Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

Paper
Add Code

Improved Algorithms for Misspecified Linear Markov Decision Processes

no code implementations • 12 Sep 2021 • Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

(P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance.

Multi-Armed Bandits

Paper
Add Code

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

no code implementations • 28 Sep 2021 • Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant

Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.

Paper
Add Code

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

no code implementations • 8 Feb 2022 • Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant

We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies.

Paper
Add Code

Finite-Time Analysis of Natural Actor-Critic for POMDPs

no code implementations • 20 Feb 2022 • Semih Cayci, Niao He, R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.

Paper
Add Code

Robust Multi-Agent Bandits Over Undirected Graphs

no code implementations • 28 Feb 2022 • Daniel Vial, Sanjay Shakkottai, R. Srikant

Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

Paper
Add Code

Minimax Regret for Cascading Bandits

no code implementations • 23 Mar 2022 • Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.

Learning-To-Rank

Paper
Add Code

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

no code implementations • 2 Jun 2022 • Semih Cayci, Niao He, R. Srikant

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.

Paper
Add Code

Learning While Scheduling in Multi-Server Systems with Unknown Statistics: MaxWeight with Discounted UCB

no code implementations • 2 Sep 2022 • Zixian Yang, R. Srikant, Lei Ying

We prove that under our algorithm the asymptotic average queue length is bounded by one divided by the traffic slackness, which is order-wise optimal.

Scheduling

Paper
Add Code

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

no code implementations • 13 Oct 2022 • Anna Winnicki, R. Srikant

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

no code implementations • 23 Jan 2023 • Anna Winnicki, R. Srikant

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function.

Paper
Add Code

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

no code implementations • 8 Feb 2023 • Yashaswini Murthy, Mehrdad Moharrami, R. Srikant

Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk sensitive value and policy iteration approaches.

Computational Efficiency

Paper
Add Code

A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

no code implementations • 14 Feb 2023 • Seo Taek Kong, Saptarshi Mandal, Dimitrios Katselis, R. Srikant

After separating tasks by type, any Dawid-Skene algorithm (i. e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values.

Vocal Bursts Type Prediction

Paper
Add Code

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

no code implementations • 17 Mar 2023 • Anna Winnicki, R. Srikant

We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs.

Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2

Paper
Add Code

Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

no code implementations • 30 May 2023 • Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant

The study of collaborative multi-agent bandits has attracted significant attention recently.

Multi-Armed Bandits

Paper
Add Code

Striking a Balance: An Optimal Mechanism Design for Heterogenous Differentially Private Data Acquisition for Logistic Regression

no code implementations • 19 Sep 2023 • Ameya Anjarlekar, Rasoul Etesami, R. Srikant

We investigate the problem of performing logistic regression on data collected from privacy-sensitive sellers.

Learning Theory

Paper
Add Code

Cascading Reinforcement Learning

no code implementations • 17 Jan 2024 • Yihan Du, R. Srikant, Wei Chen

In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability.

Recommendation Systems reinforcement-learning

Paper
Add Code

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

no code implementations • 28 Jan 2024 • R. Srikant

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains.

Paper
Add Code

Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes

no code implementations • 7 Feb 2024 • Isaac Grosof, Siva Theja Maguluri, R. Srikant

In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs.

Reinforcement Learning (RL)

Paper
Add Code

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

Paper
Add Code

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

no code implementations • 11 Mar 2024 • Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.