Search Results for author: Hengshuai Yao

Found 32 papers, 8 papers with code

QUOTA: The Quantile Option Architecture for Reinforcement Learning

3 code implementations • 5 Nov 2018 • Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao

In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).

Decision Making Distributional Reinforcement Learning +2

3,095

Paper
Code

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

1 code implementation • 6 Nov 2018 • Shangtong Zhang, Hao Chen, Hengshuai Yao

In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning.

Continuous Control reinforcement-learning +2

3,095

Paper
Code

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Vocal Bursts Valence Prediction

3,095

Paper
Code

Breaking the Deadly Triad with a Target Network

1 code implementation • 21 Jan 2021 • Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

Q-Learning

3,095

Paper
Code

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations • 19 Jul 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

2,353

Paper
Code

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation • 28 Sep 2020 • Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

2,353

Paper
Code

The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure

1 code implementation • 20 May 2022 • Xing Chen, Dongcui Diao, Hechang Chen, Hengshuai Yao, Haiyin Piao, Zhixiao Sun, Zhiwei Yang, Randy Goebel, Bei Jiang, Yi Chang

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space.

Efficient Exploration Policy Gradient Methods

Paper
Code

The Vanishing Decision Boundary Complexity and the Strong First Component

1 code implementation • 25 Nov 2022 • Hengshuai Yao

Nonetheless, we found that the decision boundaries of predecessor models on the training data are reflective of the final model's generalization.

Paper
Code

Negative Log Likelihood Ratio Loss for Deep Neural Network Classification

no code implementations • 27 Apr 2018 • Donglai Zhu, Hengshuai Yao, Bei Jiang, Peng Yu

In deep neural network, the cross-entropy loss function is commonly used for classification.

Classification General Classification +1

Paper
Add Code

Practical Issues of Action-conditioned Next Image Prediction

no code implementations • 8 Feb 2018 • Donglai Zhu, Hao Chen, Hengshuai Yao, Masoud Nosrati, Peyman Yadmellat, Yunfei Zhang

Our major finding is that action tiling encoding is the most important factor leading to the remarkable performance of the CDNA model.

SSIM

Paper
Add Code

Universal Option Models

no code implementations • NeurIPS 2014 • Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar

We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.

Paper
Add Code

Multi-Step Dyna Planning for Policy Evaluation and Control

no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári

We extend Dyna planning architecture for policy evaluation and control in two significant aspects.

Paper
Add Code

Deep Reinforcement Learning with Decorrelation

no code implementations • 18 Mar 2019 • Borislav Mavrin, Hengshuai Yao, Linglong Kong

Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.

Atari Games reinforcement-learning +2

Paper
Add Code

Single-step Options for Adversary Driving

no code implementations • 20 Mar 2019 • Nazmus Sakib, Hengshuai Yao, Hong Zhang, Shangling Jui

In this paper, we use reinforcement learning for safety driving in adversary settings.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Distributional Reinforcement Learning for Efficient Exploration

no code implementations • 13 May 2019 • Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yao-Liang Yu

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties.

Atari Games Distributional Reinforcement Learning +3

Paper
Add Code

Hill Climbing on Value Estimates for Search-control in Dyna

no code implementations • 18 Jun 2019 • Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White

In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Is Fast Adaptation All You Need?

no code implementations • 3 Oct 2019 • Khurram Javed, Hengshuai Yao, Martha White

Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.

Incremental Learning Meta-Learning

Paper
Add Code

Discounted Reinforcement Learning Is Not an Optimization Problem

no code implementations • 4 Oct 2019 • Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks.

Misconceptions reinforcement-learning +1

Paper
Add Code

One-Shot Weakly Supervised Video Object Segmentation

no code implementations • 18 Dec 2019 • Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks.

Object Segmentation +4

Paper
Add Code

Mapless Navigation among Dynamics with Social-safety-awareness: a reinforcement learning approach from 2D laser scans

no code implementations • 8 Nov 2019 • Jun Jin, Nhat M. Nguyen, Nazmus Sakib, Daniel Graves, Hengshuai Yao, Martin Jagersand

We observe that our method demonstrates time-efficient path planning behavior with high success rate in mapless navigation tasks.

Robotics

Paper
Add Code

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

no code implementations • 26 Jan 2020 • Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels.

Few-Shot Learning Object +5

Paper
Add Code

Towards a practical measure of interference for reinforcement learning

no code implementations • 7 Jul 2020 • Vincent Liu, Adam White, Hengshuai Yao, Martha White

In this work, we provide a definition of interference for control in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Variance-Reduced Off-Policy Memory-Efficient Policy Search

no code implementations • 14 Sep 2020 • Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu

To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.

Reinforcement Learning (RL) Stochastic Optimization

Paper
Add Code

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations

no code implementations • 29 Sep 2021 • Ke Sun, Yi Liu, Yingnan Zhao, Hengshuai Yao, Shangling Jui, Linglong Kong

In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Towards Safe, Explainable, and Regulated Autonomous Driving

no code implementations • 20 Nov 2021 • Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel

There has been recent and growing interest in the development and deployment of autonomous vehicles, encouraged by the empirical successes of powerful artificial intelligence techniques (AI), especially in the applications of deep learning and reinforcement learning.

Autonomous Driving Explainable Artificial Intelligence (XAI) +1

Paper
Add Code

Three-Head Neural Network Architecture for AlphaZero Learning

no code implementations • 25 Sep 2019 • Chao GAO, Martin Mueller, Ryan Hayward, Hengshuai Yao, Shangling Jui

A three-head network architecture has been recently proposed that can learn a third action-value head on a fixed dataset the same as for two-head net.

Paper
Add Code

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

no code implementations • 21 Dec 2021 • Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel

First, we provide a thorough overview of the state-of-the-art and emerging approaches for XAI-based autonomous driving.

Autonomous Driving Decision Making +2

Paper
Add Code

Learning to Accelerate by the Methods of Step-size Planning

no code implementations • 1 Apr 2022 • Hengshuai Yao

In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques.

Paper
Add Code

Class Interference of Deep Neural Networks

no code implementations • 31 Oct 2022 • Dongcui Diao, Hengshuai Yao, Bei Jiang

Recognizing and telling similar objects apart is even hard for human beings.

Paper
Add Code

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$λ$ Smoothness

no code implementations • 29 Jul 2023 • Hengshuai Yao

Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate.

Paper
Add Code

Baird Counterexample is Solved: with an example of How to Debug a Two-time-scale Algorithm

no code implementations • 18 Aug 2023 • Hengshuai Yao

This note is to understand in particular, why TDC is slow on this example, and provide a debugging analysis to understand this behavior.

Paper
Add Code

Careful at Estimation and Bold at Exploration

no code implementations • 22 Aug 2023 • Xing Chen, Yijun Liu, Zhaogeng Liu, Hechang Chen, Hengshuai Yao, Yi Chang

In prior work, it has been shown that policy-based exploration is beneficial for continuous action space in deterministic policy reinforcement learning(DPRL).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.