no code implementations • 15 Feb 2021 • Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor
Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games.
no code implementations • 29 Dec 2020 • Daniel Graves, Jun Jin, Jun Luo
Our approach facilitates the learning of new policies by (1) maximizing the target MDP reward with the help of the black-box option, and (2) returning the agent to states in the learned initiation set of the black-box option where it is already optimal.
no code implementations • 11 Nov 2020 • Jun Jin, Daniel Graves, Cameron Haigh, Jun Luo, Martin Jagersand
We consider real-world reinforcement learning (RL) of robotic manipulation tasks that involve both visuomotor skills and contact-rich skills.
no code implementations • 6 Nov 2020 • Daniel Graves
Hyperoctahedral homology is the homology theory associated to the hyperoctahedral crossed simplicial group.
Algebraic Topology 55N35, 13D03, 55U15, 55P47
no code implementations • 27 Oct 2020 • Daniel Graves, Johannes Günther, Jun Luo
General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment.
1 code implementation • NeurIPS 2021 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.
5 code implementations • 19 Oct 2020 • Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham Bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang
We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving.
no code implementations • 28 Sep 2020 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Wulong Liu, Yaodong Yang
The value function lies in the heart of Reinforcement Learning (RL), which defines the long-term evaluation of a policy in a given state.
no code implementations • 26 Jun 2020 • Daniel Graves, Nhat M. Nguyen, Kimia Hassanzadeh, Jun Jin
Reinforcement learning using a novel predictive representation is applied to autonomous driving to accomplish the task of driving between lane markings where substantial benefits in performance and generalization are observed on unseen test roads in both simulation and on a real Jackal robot.
no code implementations • 24 Jan 2020 • Daniel Graves, Kasra Rezaee, Sean Scheideman
We demonstrate perception as prediction by learning to predict an agent's front safety and rear safety with GVFs, which encapsulate anticipation of the behavior of the vehicle in front and in the rear, respectively.
no code implementations • 19 Nov 2019 • Borislav Mavrin, Daniel Graves, Alan Chan
Learning good representations is a long standing problem in reinforcement learning (RL).
no code implementations • 8 Nov 2019 • Jun Jin, Nhat M. Nguyen, Nazmus Sakib, Daniel Graves, Hengshuai Yao, Martin Jagersand
We observe that our method demonstrates time-efficient path planning behavior with high success rate in mapless navigation tasks.
Robotics
no code implementations • 9 Sep 2019 • Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.
2 code implementations • NeurIPS 2019 • Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White
Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.
no code implementations • 27 Sep 2018 • Matthew Schlegel, Wesley Chung, Daniel Graves, Martha White
We propose Importance Resampling (IR) for off-policy learning, that resamples experience from the replay buffer and applies a standard on-policy update.