no code implementations • 5 Mar 2025 • Arthur Zhang, Harshit Sikchi, Amy Zhang, Joydeep Biswas
We address the long-horizon mapless navigation problem: enabling robots to traverse novel environments without relying on high-definition maps or precise waypoints that specify exactly where to navigate.
no code implementations • 7 Dec 2024 • Harshit Sikchi, Siddhant Agarwal, Pranaya Jajoo, Samyak Parajuli, Caleb Chuck, Max Rudolph, Peter Stone, Amy Zhang, Scott Niekum
In this work, we show that we can achieve a zero-shot language-to-behavior policy by first grounding the imagined sequences in real observations of an unsupervised RL agent and using a closed-form solution to imitation learning that allows the RL agent to mimic the grounded observations.
no code implementations • 29 Nov 2024 • Siddhant Agarwal, Harshit Sikchi, Peter Stone, Amy Zhang
We present \emph{Proto Successor Measure}: the basis set for all possible solutions of Reinforcement Learning in a dynamical system.
no code implementations • 13 Jun 2024 • Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum
DILO reduces the learning from observations problem to that of simply learning an actor and a critic, bearing similar complexity to vanilla offline RL.
no code implementations • 5 Jun 2024 • Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum
Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process.
no code implementations • 6 May 2024 • Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum
Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail.
no code implementations • 3 Nov 2023 • Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
1 code implementation • 20 Oct 2023 • Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.
1 code implementation • 16 Feb 2023 • Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum
For offline RL, our analysis frames a recent offline RL method XQL in the dual framework, and we further propose a new method f-DVL that provides alternative choices to the Gumbel regression loss that fixes the known training instability issue of XQL.
no code implementations • 7 Feb 2022 • Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum
We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward.
1 code implementation • 16 Mar 2021 • Harshit Sikchi, Wenxuan Zhou, David Held
Current RL agents explore the environment without considering these constraints, which can lead to damage to the hardware or even other agents in the environment.
1 code implementation • 9 Nov 2020 • Tianwei Ni, Harshit Sikchi, YuFei Wang, Tejus Gupta, Lisa Lee, Benjamin Eysenbach
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories on IRL benchmarks.
1 code implementation • 23 Aug 2020 • Harshit Sikchi, Wenxuan Zhou, David Held
In this work, we investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function learned by a model-free off-policy algorithm, named Learning Off-Policy with Online Planning (LOOP).
no code implementations • 31 Jul 2020 • Shubhankar Agarwal, Harshit Sikchi, Cole Gulino, Eric Wilkinson, Shivam Gautam
A popular way to plan trajectories in dynamic urban scenarios for Autonomous Vehicles is to rely on explicitly specified and hand crafted cost functions, coupled with random sampling in the trajectory space to find the minimum cost trajectory.