Search Results for author: Hengshuai Yao

Found 32 papers, 8 papers with code

Universal Option Models

no code implementations NeurIPS 2014 Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar

We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.

Practical Issues of Action-conditioned Next Image Prediction

no code implementations8 Feb 2018 Donglai Zhu, Hao Chen, Hengshuai Yao, Masoud Nosrati, Peyman Yadmellat, Yunfei Zhang

Our major finding is that action tiling encoding is the most important factor leading to the remarkable performance of the CDNA model.

SSIM

QUOTA: The Quantile Option Architecture for Reinforcement Learning

3 code implementations5 Nov 2018 Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao

In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).

Decision Making Distributional Reinforcement Learning +2

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

1 code implementation6 Nov 2018 Shangtong Zhang, Hao Chen, Hengshuai Yao

In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning.

Continuous Control reinforcement-learning +2

Deep Reinforcement Learning with Decorrelation

no code implementations18 Mar 2019 Borislav Mavrin, Hengshuai Yao, Linglong Kong

Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.

Atari Games reinforcement-learning +2

Distributional Reinforcement Learning for Efficient Exploration

no code implementations13 May 2019 Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yao-Liang Yu

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties.

Atari Games Distributional Reinforcement Learning +3

Hill Climbing on Value Estimates for Search-control in Dyna

no code implementations18 Jun 2019 Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White

In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Three-Head Neural Network Architecture for AlphaZero Learning

no code implementations25 Sep 2019 Chao GAO, Martin Mueller, Ryan Hayward, Hengshuai Yao, Shangling Jui

A three-head network architecture has been recently proposed that can learn a third action-value head on a fixed dataset the same as for two-head net.

Is Fast Adaptation All You Need?

no code implementations3 Oct 2019 Khurram Javed, Hengshuai Yao, Martha White

Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.

Incremental Learning Meta-Learning

Discounted Reinforcement Learning Is Not an Optimization Problem

no code implementations4 Oct 2019 Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks.

Misconceptions reinforcement-learning +1

Mapless Navigation among Dynamics with Social-safety-awareness: a reinforcement learning approach from 2D laser scans

no code implementations8 Nov 2019 Jun Jin, Nhat M. Nguyen, Nazmus Sakib, Daniel Graves, Hengshuai Yao, Martin Jagersand

We observe that our method demonstrates time-efficient path planning behavior with high success rate in mapless navigation tasks.

Robotics

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Vocal Bursts Valence Prediction

One-Shot Weakly Supervised Video Object Segmentation

no code implementations18 Dec 2019 Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks.

Object Segmentation +4

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

no code implementations26 Jan 2020 Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels.

Few-Shot Learning Object +5

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations19 Jul 2020 Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

Variance-Reduced Off-Policy Memory-Efficient Policy Search

no code implementations14 Sep 2020 Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu

To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.

Reinforcement Learning (RL) Stochastic Optimization

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation28 Sep 2020 Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

Breaking the Deadly Triad with a Target Network

1 code implementation21 Jan 2021 Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

Q-Learning

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations

no code implementations29 Sep 2021 Ke Sun, Yi Liu, Yingnan Zhao, Hengshuai Yao, Shangling Jui, Linglong Kong

In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.

Distributional Reinforcement Learning reinforcement-learning +1

Towards Safe, Explainable, and Regulated Autonomous Driving

no code implementations20 Nov 2021 Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel

There has been recent and growing interest in the development and deployment of autonomous vehicles, encouraged by the empirical successes of powerful artificial intelligence techniques (AI), especially in the applications of deep learning and reinforcement learning.

Autonomous Driving Explainable Artificial Intelligence (XAI) +1

Learning to Accelerate by the Methods of Step-size Planning

no code implementations1 Apr 2022 Hengshuai Yao

In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques.

Class Interference of Deep Neural Networks

no code implementations31 Oct 2022 Dongcui Diao, Hengshuai Yao, Bei Jiang

Recognizing and telling similar objects apart is even hard for human beings.

The Vanishing Decision Boundary Complexity and the Strong First Component

1 code implementation25 Nov 2022 Hengshuai Yao

Nonetheless, we found that the decision boundaries of predecessor models on the training data are reflective of the final model's generalization.

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$λ$ Smoothness

no code implementations29 Jul 2023 Hengshuai Yao

Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate.

Baird Counterexample is Solved: with an example of How to Debug a Two-time-scale Algorithm

no code implementations18 Aug 2023 Hengshuai Yao

This note is to understand in particular, why TDC is slow on this example, and provide a debugging analysis to understand this behavior.

Careful at Estimation and Bold at Exploration

no code implementations22 Aug 2023 Xing Chen, Yijun Liu, Zhaogeng Liu, Hechang Chen, Hengshuai Yao, Yi Chang

In prior work, it has been shown that policy-based exploration is beneficial for continuous action space in deterministic policy reinforcement learning(DPRL).

Cannot find the paper you are looking for? You can Submit a new open access paper.