no code implementations • 8 Feb 2024 • Mohak Bhardwaj, Thomas Lampe, Michael Neunert, Francesco Romano, Abbas Abdolmaleki, Arunkumar Byravan, Markus Wulfmeier, Martin Riedmiller, Jonas Buchli
Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale.
no code implementations • NeurIPS 2023 • Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.
no code implementations • 8 Nov 2022 • Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.
no code implementations • 10 Oct 2021 • Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots, Siddhartha Srinivasa
If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly.
no code implementations • ICLR 2021 • Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots
We further propose an algorithm that changes $\lambda$ over time to reduce the dependence on MPC as our estimates of the value function improve, and test the efficacy our approach on challenging high-dimensional manipulation tasks with biased models in simulation.
no code implementations • 31 Dec 2019 • Mohak Bhardwaj, Ankur Handa, Dieter Fox, Byron Boots
Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately.
no code implementations • 16 Jul 2019 • Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots, Siddhartha Srinivasa
If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly.
1 code implementation • 10 Jul 2017 • Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer
In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand.