1 code implementation • 20 Nov 2024 • Atharva Gundawar, Karthik Valmeekam, Mudit Verma, Subbarao Kambhampati
In this framework, an LLM is paired with a complete set of sound verifiers that validate its output, re-prompting it if it fails.
no code implementations • 31 May 2024 • Atharva Gundawar, Mudit Verma, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati
As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies.
no code implementations • 22 May 2024 • Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
In this paper we examine these claims of ReAct based prompting in improving agentic LLMs for sequential decision-making.
no code implementations • 12 Apr 2024 • Mudit Verma, Katherine Metcalf
Incorporating state importance into reward learning improves the speed of policy learning, overall policy performance, and reward recovery on both locomotion and manipulation tasks.
no code implementations • 2 Feb 2024 • Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy
On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers.
no code implementations • 10 Jan 2024 • Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
In this work, we explore the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot's generated behavior in a manner similar to human observer.
no code implementations • 21 Dec 2023 • Siddhant Bhambri, Mudit Verma, Upasana Biswas, Anil Murthy, Subbarao Kambhampati
To this end, we perform the first investigation of multi-agent PbRL by extending single-agent PbRL to the two-agent teaming settings and formulate it as a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior.
no code implementations • 28 Feb 2023 • Tung Thai, Ming Shen, Mayank Garg, Ayush Kalani, Nakul Vaidya, Utkarsh Soni, Mudit Verma, Sriram Gopalakrishnan, Neeraj Varshney, Chitta Baral, Subbarao Kambhampati, Jivko Sinapov, Matthias Scheutz
Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance.
no code implementations • 17 Feb 2023 • Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
Preference Based Reinforcement Learning has shown much promise for utilizing human binary feedback on queried trajectory pairs to recover the underlying reward model of the Human in the Loop (HiL).
no code implementations • 17 Feb 2023 • Mudit Verma, Subbarao Kambhampati
Reinforcement Learning has suffered from poor reward specification, and issues for reward hacking even in simple enough domains.
no code implementations • 17 Feb 2023 • Mudit Verma, Subbarao Kambhampati
We propose a data-driven reward initialization method that does not add any additional cost to the human in the loop and negligible cost to the PbRL agent and show that doing so ensures that the predicted rewards of the initialized reward model are uniform in the state space and this reduces the variability in the performance of the method across multiple runs and is shown to improve the overall performance compared to other initialization methods.
no code implementations • 27 Oct 2022 • Utkarsh Soni, Nupur Thakur, Sarath Sreedharan, Lin Guan, Mudit Verma, Matthew Marquez, Subbarao Kambhampati
If the relevant concept is not in the shared vocabulary, then it is learned.
no code implementations • 17 Oct 2022 • Mudit Verma, Katherine Metcalf
Specifying rewards for reinforcement learned (RL) agents is challenging.
no code implementations • 7 Oct 2022 • Mudit Verma, Ayush Kharkwal, Subbarao Kambhampati
Through our experiments, we show that our method can provide an interpretable means of solving the Advice-Conformance Verification problem by conveying whether or not the agent is using the human's advice.
no code implementations • 21 Sep 2021 • Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan
The jury is still out on whether AI systems will need to use symbols in their internal reasoning to achieve general intelligence capabilities.
1 code implementation • 15 Sep 2021 • Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati
We present a framework to model the human agent's behavior with respect to state uncertainty, and can be used to compute MDP policies that accounts for these problems.
no code implementations • 3 May 2021 • Zahra Zahedi, Mudit Verma, Sarath Sreedharan, Subbarao Kambhampati
The problem of trust management is particularly challenging in mixed human-robot teams where the human and the robot may have different models about the task at hand and thus may have different expectations regarding the current course of action, thereby forcing the robot to focus on the costly explicable behavior.
no code implementations • 22 Feb 2021 • Mudit Verma, Pradyumna Sinha, Karan Goyal, Apoorva Verma, Seba Susan
Neural networks have now long been used for solving complex problems of image domain, yet designing the same needs manual expertise.
no code implementations • 12 Jul 2020 • Mudit Verma, Arun Balaji Buduru
Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis.
1 code implementation • NeurIPS 2021 • Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati
We focus on the task of learning from feedback, in which the human trainer not only gives binary evaluative "good" or "bad" feedback for queried state-action pairs, but also provides a visual explanation by annotating relevant features in images.
no code implementations • ICLR 2022 • Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, Subbarao Kambhampati
As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions.
no code implementations • 6 Dec 2019 • Mudit Verma, Siddhant Bhambri, Saurabh Gupta, Arun Balaji Buduru
Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement.