no code implementations • 17 Apr 2024 • Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia
We provide a theoretical guarantee that optimizing CyclER will achieve policies that satisfy the LTL constraint with near-optimal probability.
1 code implementation • NeurIPS 2023 • Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas A. Henzinger
We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies.
no code implementations • 3 Mar 2023 • Cameron Voloshin, Abhinav Verma, Yisong Yue
Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions.
1 code implementation • NeurIPS 2020 • Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri
We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces.
1 code implementation • NeurIPS 2020 • Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri
This relaxed program is differentiable and can be trained end-to-end, and the resulting training loss is an approximately admissible heuristic that can guide the combinatorial search.
no code implementations • NeurIPS 2019 • Abhinav Verma, Hoang M. Le, Yisong Yue, Swarat Chaudhuri
First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space.
1 code implementation • 14 May 2019 • Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick
We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off.
no code implementations • ICLR 2019 • Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Swarat Chaudhuri, Ankit B. Patel
We study the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language.
no code implementations • 27 Feb 2019 • Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Richard G. Baraniuk, Swarat Chaudhuri, Ankit B. Patel
We investigate the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language.
no code implementations • ICML 2018 • Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri
Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language.