no code implementations • 3 Aug 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Furong Huang, Mengdi Wang
To mathematically encapsulate the problem of aligning RL policy optimization with such externalities, we consider a bilevel optimization problem and connect it to a principal-agent framework, where the principal specifies the broader goals and constraints of the system at the upper level and the agent solves a Markov Decision Process (MDP) at the lower level.
no code implementations • 18 Jun 2023 • Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal
To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L).
1 code implementation • 9 Jun 2023 • Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha
Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations.
no code implementations • 9 Jun 2023 • Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha
Reinforcement learning methods, while effective for learning robotic navigation strategies, are known to be highly sample inefficient.
no code implementations • 10 Apr 2023 • Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang
Our work focuses on the challenge of detecting outputs generated by Large Language Models (LLMs) to distinguish them from those generated by humans.
no code implementations • 14 Mar 2023 • Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor, Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, Dinesh Manocha
Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures.
no code implementations • 28 Jan 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 28 Jan 2023 • Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha
Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection.
1 code implementation • 25 Oct 2022 • Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, Furong Huang
Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms.
no code implementations • 7 Sep 2022 • Aakriti Agrawal, Senthil Hariharan, Amrit Singh Bedi, Dinesh Manocha
At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD).
no code implementations • 22 Jun 2022 • Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha
In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program.
no code implementations • 12 Jun 2022 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha
In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
no code implementations • 12 Jun 2022 • Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal
We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function.
no code implementations • 2 Jun 2022 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha
Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time.
no code implementations • 28 Jan 2022 • Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel
Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space.
no code implementations • 22 Oct 2021 • Zeeshan Akhtar, Amrit Singh Bedi, Srujan Teja Thomdapu, Ketan Rajawat
The proposed $\textbf{S}$tochastic $\textbf{C}$ompositional $\textbf{F}$rank-$\textbf{W}$olfe ($\textbf{SCFW}$) is shown to achieve a sample complexity of $\mathcal{O}(\epsilon^{-2})$ for convex objectives and $\mathcal{O}(\epsilon^{-3})$ for non-convex objectives, at par with the state-of-the-art sample complexities for projection-free algorithms solving single-level problems.
no code implementations • 22 Oct 2021 • Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, Vaneet Aggarwal
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization.
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 13 Sep 2021 • Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal
To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.
no code implementations • 26 Jul 2021 • Michael E. Kepler, Alec Koppel, Amrit Singh Bedi, Daniel J. Stilwell
Gaussian processes (GPs) are a well-known nonparametric Bayesian inference technique, but they suffer from scalability problems for large sample sizes, and their performance can degrade for non-stationary or spatially heterogeneous data.
no code implementations • 15 Jun 2021 • Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel
To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.
no code implementations • 29 May 2021 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".
no code implementations • 13 Aug 2020 • Zeeshan Akhtar, Amrit Singh Bedi, Ketan Rajawat
In this work, we propose the FW-CSOA algorithm that is not only projection-free but also achieves zero constraint violation with $\O\left(T^{-\frac{1}{4}}\right)$ decay of the optimality gap.
no code implementations • NeurIPS 2020 • Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang
Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
no code implementations • L4DC 2020 • Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Alec Koppel
Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms on various hyper-parameter tuning tasks, suggesting the merits of managing the complexity of GPs in bandit settings
no code implementations • 23 Mar 2020 • Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel
Doing so permits us to precisely characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $\epsilon$ for both discrete and continuous action sets.
no code implementations • 27 Feb 2020 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
no code implementations • 25 Sep 2019 • Alec Koppel, Amrit Singh Bedi, Ketan Rajawat, Brian M. Sadler
Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models.
no code implementations • 12 Sep 2019 • Amrit Singh Bedi, Alec Koppel, Ketan Rajawat, Brian M. Sadler
Prior works control dynamic regret growth only for linear models.
no code implementations • 1 Aug 2019 • Hrusikesha Pradhan, Amrit Singh Bedi, Alec Koppel, Ketan Rajawat
We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams.
no code implementations • 16 May 2019 • Rishabh Dixit, Amrit Singh Bedi, Ketan Rajawat
The empirical performance of the proposed algorithm is tested on the distributed dynamic sparse recovery problem, where it is shown to incur a dynamic regret that is close to that of the centralized algorithm.