3 code implementations • 5 Nov 2018 • Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao
In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).
1 code implementation • 6 Nov 2018 • Shangtong Zhang, Hao Chen, Hengshuai Yao
In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning.
1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson
With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.
1 code implementation • 21 Jan 2021 • Shangtong Zhang, Hengshuai Yao, Shimon Whiteson
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.
2 code implementations • 19 Jul 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo
Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.
1 code implementation • 28 Sep 2020 • Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao
The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.
1 code implementation • 20 May 2022 • Xing Chen, Dongcui Diao, Hechang Chen, Hengshuai Yao, Haiyin Piao, Zhixiao Sun, Zhiwei Yang, Randy Goebel, Bei Jiang, Yi Chang
The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space.
1 code implementation • 25 Nov 2022 • Hengshuai Yao
Nonetheless, we found that the decision boundaries of predecessor models on the training data are reflective of the final model's generalization.
no code implementations • 27 Apr 2018 • Donglai Zhu, Hengshuai Yao, Bei Jiang, Peng Yu
In deep neural network, the cross-entropy loss function is commonly used for classification.
no code implementations • 8 Feb 2018 • Donglai Zhu, Hao Chen, Hengshuai Yao, Masoud Nosrati, Peyman Yadmellat, Yunfei Zhang
Our major finding is that action tiling encoding is the most important factor leading to the remarkable performance of the CDNA model.
no code implementations • NeurIPS 2014 • Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar
We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.
no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári
We extend Dyna planning architecture for policy evaluation and control in two significant aspects.
no code implementations • 18 Mar 2019 • Borislav Mavrin, Hengshuai Yao, Linglong Kong
Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.
no code implementations • 20 Mar 2019 • Nazmus Sakib, Hengshuai Yao, Hong Zhang, Shangling Jui
In this paper, we use reinforcement learning for safety driving in adversary settings.
no code implementations • 13 May 2019 • Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yao-Liang Yu
In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties.
no code implementations • 18 Jun 2019 • Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White
In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.
Model-based Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 3 Oct 2019 • Khurram Javed, Hengshuai Yao, Martha White
Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.
no code implementations • 4 Oct 2019 • Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton
Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks.
no code implementations • 18 Dec 2019 • Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand
Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks.
no code implementations • 8 Nov 2019 • Jun Jin, Nhat M. Nguyen, Nazmus Sakib, Daniel Graves, Hengshuai Yao, Martin Jagersand
We observe that our method demonstrates time-efficient path planning behavior with high success rate in mapless navigation tasks.
Robotics
no code implementations • 26 Jan 2020 • Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand
Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels.
no code implementations • 7 Jul 2020 • Vincent Liu, Adam White, Hengshuai Yao, Martha White
In this work, we provide a definition of interference for control in reinforcement learning.
no code implementations • 14 Sep 2020 • Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu
To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.
no code implementations • 29 Sep 2021 • Ke Sun, Yi Liu, Yingnan Zhao, Hengshuai Yao, Shangling Jui, Linglong Kong
In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 20 Nov 2021 • Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel
There has been recent and growing interest in the development and deployment of autonomous vehicles, encouraged by the empirical successes of powerful artificial intelligence techniques (AI), especially in the applications of deep learning and reinforcement learning.
Autonomous Driving Explainable Artificial Intelligence (XAI) +1
no code implementations • 25 Sep 2019 • Chao GAO, Martin Mueller, Ryan Hayward, Hengshuai Yao, Shangling Jui
A three-head network architecture has been recently proposed that can learn a third action-value head on a fixed dataset the same as for two-head net.
no code implementations • 21 Dec 2021 • Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel
First, we provide a thorough overview of the state-of-the-art and emerging approaches for XAI-based autonomous driving.
no code implementations • 1 Apr 2022 • Hengshuai Yao
In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques.
no code implementations • 31 Oct 2022 • Dongcui Diao, Hengshuai Yao, Bei Jiang
Recognizing and telling similar objects apart is even hard for human beings.
no code implementations • 29 Jul 2023 • Hengshuai Yao
Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate.
no code implementations • 18 Aug 2023 • Hengshuai Yao
This note is to understand in particular, why TDC is slow on this example, and provide a debugging analysis to understand this behavior.
no code implementations • 22 Aug 2023 • Xing Chen, Yijun Liu, Zhaogeng Liu, Hechang Chen, Hengshuai Yao, Yi Chang
In prior work, it has been shown that policy-based exploration is beneficial for continuous action space in deterministic policy reinforcement learning(DPRL).