no code implementations • 17 Apr 2024 • Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia
In our work, we consider the setting where the task is specified by an LTL objective and there is an additional scalar reward that we need to optimize.
1 code implementation • NeurIPS 2023 • Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas A. Henzinger
We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies.
no code implementations • 3 Mar 2023 • Cameron Voloshin, Abhinav Verma, Yisong Yue
Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions.
1 code implementation • NeurIPS 2020 • Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri
We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces.
1 code implementation • NeurIPS 2020 • Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri
This relaxed program is differentiable and can be trained end-to-end, and the resulting training loss is an approximately admissible heuristic that can guide the combinatorial search.
no code implementations • NeurIPS 2019 • Abhinav Verma, Hoang M. Le, Yisong Yue, Swarat Chaudhuri
First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space.
1 code implementation • 14 May 2019 • Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick
We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off.
no code implementations • ICLR 2019 • Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Swarat Chaudhuri, Ankit B. Patel
We study the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language.
no code implementations • 27 Feb 2019 • Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Richard G. Baraniuk, Swarat Chaudhuri, Ankit B. Patel
We investigate the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language.
no code implementations • ICML 2018 • Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri
Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language.