no code implementations • 4 Dec 2023 • Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White
As a result, no OPS method can be more sample efficient than OPE in the worst case.
no code implementations • 16 Jul 2020 • Yasuhiro Fujita, Kota Uenishi, Avinash Ummadisingu, Prabhat Nagarajan, Shimpei Masuda, Mario Ynocente Castro
Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems.
1 code implementation • 1 Feb 2020 • Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation.
1 code implementation • 9 Dec 2019 • Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa
In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework.
no code implementations • 9 Dec 2019 • Aaron Havens, Yi Ouyang, Prabhat Nagarajan, Yasuhiro Fujita
The latent representation is learned exclusively from multi-step reward prediction which we show to be the only necessary information for successful planning.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 25 Sep 2019 • Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
Reinforcement Learning (RL) has demonstrated promising results across several sequential decision-making tasks.
3 code implementations • 12 Apr 2019 • Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum
A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.
1 code implementation • 15 Sep 2018 • Prabhat Nagarajan, Garrett Warnell, Peter Stone
One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance.