no code implementations • 3 Mar 2025 • Gokul Swamy, Sanjiban Choudhury, Wen Sun, Zhiwei Steven Wu, J. Andrew Bagnell
From a first-principles perspective, it may seem odd that the strongest results in foundation model fine-tuning (FT) are achieved via a relatively complex, two-stage training procedure.
1 code implementation • 27 Feb 2025 • Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush, Wenting Zhao, Sanjiban Choudhury
We provide analysis of the design choices of the reward models and policy, and show the efficacy of $\mu$Code at utilizing the execution feedback.
1 code implementation • 14 Feb 2025 • Sanjiban Choudhury
We introduce Agent Process Reward Models (AgentPRM), a simple and scalable framework for training LLM agents to continually improve through interactions.
1 code implementation • 8 Feb 2025 • William Huey, Huaxiaoyue Wang, Anne Wu, Yoav Artzi, Sanjiban Choudhury
Existing approaches treat imitation as a distribution-matching problem, aligning individual frames between the agent and the demonstration.
1 code implementation • 6 Feb 2025 • Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, Sanjiban Choudhury
Effective asynchronous planning, or the ability to efficiently reason and plan over states and actions that must happen in parallel or sequentially, is essential for agents that must account for time delays, reason over diverse long-horizon tasks, and collaborate with other agents.
no code implementations • 13 Jan 2025 • Juntao Ren, Priya Sundaresan, Dorsa Sadigh, Sanjiban Choudhury, Jeannette Bohg
We instantiate an IL policy called Motion Track Policy (MT-pi) which receives image observations and outputs motion tracks as actions.
1 code implementation • 1 Jan 2025 • David Wu, Sanjiban Choudhury
Aligning large language models (LLMs) to human preferences is challenging in domains where preference data is unavailable.
1 code implementation • 9 Dec 2024 • Gonzalo Gonzalez-Pumariega, Wayne Chen, Kushal Kedia, Sanjiban Choudhury
The first uses LLMs as a heuristic within a search-based planner to select promising nodes to expand and propose promising actions.
1 code implementation • 11 Nov 2024 • Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
no code implementations • 7 Oct 2024 • Sanjiban Choudhury, Paloma Sodhi
Our key insight is to equip the expert teachers with a privileged state -- information that is available during training but hidden at test time.
no code implementations • 10 Sep 2024 • Kushal Kedia, Prithwish Dan, Angela Chao, Maximus Adrian Pace, Sanjiban Choudhury
Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks.
no code implementations • 25 Aug 2024 • David Durst, Feng Xie, Vishnu Sarukkai, Brennan Shacklett, Iuri Frosio, Chen Tessler, Joohwan Kim, Carly Taylor, Gilbert Bernstein, Sanjiban Choudhury, Pat Hanrahan, Kayvon Fatahalian
We curate a team movement dataset comprising 123 hours of professional game play traces, and use this dataset to train a transformer-based movement model that generates human-like team movement for all players in a "Retakes" round of the game.
1 code implementation • 13 Feb 2024 • Juntao Ren, Gokul Swamy, Zhiwei Steven Wu, J. Andrew Bagnell, Sanjiban Choudhury
In this work, we propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
no code implementations • 4 Feb 2024 • David Wu, Gokul Swamy, J. Andrew Bagnell, Zhiwei Steven Wu, Sanjiban Choudhury
Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations.
no code implementations • 4 Feb 2024 • David Wu, Sanjiban Choudhury
Existing inverse reinforcement learning methods (e. g. MaxEntIRL, $f$-IRL) search over candidate reward functions and solve a reinforcement learning problem in the inner loop.
no code implementations • 21 Nov 2023 • Kushal Kedia, Atiksh Bhardwaj, Prithwish Dan, Sanjiban Choudhury
In collaborative human-robot manipulation, a robot must predict human intents and adapt its actions accordingly to smoothly execute tasks.
no code implementations • 14 Nov 2023 • Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr
To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning.
no code implementations • 20 Oct 2023 • Kushal Kedia, Prithwish Dan, Atiksh Bhardwaj, Sanjiban Choudhury
Seamless human-robot manipulation in close proximity relies on accurate forecasts of human motion.
1 code implementation • NeurIPS 2023 • Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Zhiwei Steven Wu
Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect.
1 code implementation • 11 Aug 2023 • Kushal Kedia, Prithwish Dan, Sanjiban Choudhury
On the other hand, planning for worst-case motions leads to overtly conservative behavior and a "frozen robot".
1 code implementation • 26 Mar 2023 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory.
1 code implementation • 1 Mar 2023 • Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation.
no code implementations • 19 Aug 2022 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
A variety of problems in econometrics and machine learning, including instrumental variable regression and Bellman residual minimization, can be formulated as satisfying a set of conditional moment restrictions (CMR).
1 code implementation • 3 Aug 2022 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
We consider imitation learning problems where the learner's ability to mimic the expert increases throughout the course of an episode as more information is revealed.
1 code implementation • 30 May 2022 • Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran
In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work.
1 code implementation • 2 Feb 2022 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions.
no code implementations • 10 Oct 2021 • Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots, Siddhartha Srinivasa
If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly.
no code implementations • 5 Oct 2021 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning.
no code implementations • 29 Sep 2021 • Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu
Both approaches are able to find policies that match the result of a query to an unconfounded expert.
no code implementations • 2 Apr 2021 • Matthew Schmittle, Sanjiban Choudhury, Siddhartha S. Srinivasa
A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide.
3 code implementations • 4 Mar 2021 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching.
no code implementations • 4 Feb 2021 • Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, J. Andrew Bagnell
The learner often comes to rely on features that are strongly predictive of decisions, but are subject to strong covariate shift.
no code implementations • ICLR 2021 • Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots
We further propose an algorithm that changes $\lambda$ over time to reduce the dependence on MPC as our estimates of the value function improve, and test the efficacy our approach on challenging high-dimensional manipulation tasks with biased models in simulation.
no code implementations • 7 Feb 2020 • Gilwoo Lee, Brian Hou, Sanjiban Choudhury, Siddhartha S. Srinivasa
We first obtain an ensemble of experts, one for each latent MDP, and fuse their advice to compute a baseline policy.
no code implementations • 15 Oct 2019 • Rogerio Bonatti, Wenshan Wang, Cherie Ho, Aayush Ahuja, Mirko Gschwindt, Efe Camci, Erdal Kayacan, Sanjiban Choudhury, Sebastian Scherer
In this work, we address the problem in its entirety and propose a complete system for real-time aerial cinematography that for the first time combines: (1) vision-based target estimation; (2) 3D signed-distance mapping for occlusion estimation; (3) efficient trajectory optimization for long time-horizon camera motion; and (4) learning-based artistic shot selection.
no code implementations • 16 Jul 2019 • Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots, Siddhartha Srinivasa
If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly.
no code implementations • 30 May 2019 • Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa
We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes.
no code implementations • 4 Apr 2019 • Rogerio Bonatti, Cherie Ho, Wenshan Wang, Sanjiban Choudhury, Sebastian Scherer
In this work, we overcome such limitations and propose a complete system for aerial cinematography that combines: (1) a vision-based algorithm for target localization; (2) a real-time incremental 3D signed-distance map algorithm for occlusion and safety computation; and (3) a real-time camera motion planner that optimizes smoothness, collisions, occlusions and artistic guidelines.
no code implementations • 6 Oct 2018 • Gilwoo Lee, Sanjiban Choudhury, Brian Hou, Siddhartha S. Srinivasa
We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge.
no code implementations • ICLR 2019 • Gilwoo Lee, Brian Hou, Aditya Mandalika, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa
Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world.
no code implementations • 28 Aug 2018 • Rogerio Bonatti, yanfu Zhang, Sanjiban Choudhury, Wenshan Wang, Sebastian Scherer
Autonomous aerial cinematography has the potential to enable automatic capture of aesthetically pleasing videos without requiring human intervention, empowering individuals with the capability of high-end film studios.
no code implementations • 7 Apr 2018 • Sankalp Arora, Sanjiban Choudhury, Sebastian Scherer
The contribution of the paper helps identify the properties of a POMDP problem for which the use of MDP based POMDP solvers is inappropriate, enabling better design choices.
no code implementations • 20 Nov 2017 • Sanjiban Choudhury, Siddhartha Srinivasa, Sebastian Scherer
We are interested in planning algorithms that actively infer the underlying structure of the valid configuration space during planning in order to find solutions with minimal effort.
1 code implementation • 10 Nov 2017 • Shushman Choudhury, Oren Salzman, Sanjiban Choudhury, Christopher M. Dellin, Siddhartha S. Srinivasa
We propose an algorithmic framework for efficient anytime motion planning on large dense geometric roadmaps, in domains where collision checks and therefore edge evaluations are computationally expensive.
Robotics
1 code implementation • 10 Jul 2017 • Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer
In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand.
1 code implementation • NeurIPS 2017 • Sanjiban Choudhury, Shervin Javdani, Siddhartha Srinivasa, Sebastian Scherer
By leveraging this property, we are able to significantly reduce computational complexity from exponential to linear in the number of edges.
Robotics
no code implementations • 13 Nov 2016 • Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Debadeepta Dey
The budgeted information gathering problem - where a robot with a fixed fuel budget is required to maximize the amount of information gathered from the world - appears in practice across a wide range of applications in autonomous exploration and inspection with mobile robots.