Search Results for author: Huizhen Yu

Found 9 papers, 1 papers with code

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays

no code implementations • 22 Dec 2023 • Huizhen Yu, Yi Wan, Richard S. Sutton

In this paper, we study asynchronous stochastic approximation algorithms without communication delays.

reinforcement-learning

Paper
Add Code

Two geometric input transformation methods for fast online reinforcement learning with neural nets

no code implementations • 18 May 2018 • Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton

We apply neural nets with ReLU gates in online reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

no code implementations • 27 Dec 2017 • Huizhen Yu

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation.

Paper
Add Code

On Generalized Bellman Equations and Temporal-Difference Learning

no code implementations • 14 Apr 2017 • Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton

As to its soundness, using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process.

Paper
Add Code

Multi-step Off-policy Learning Without Importance Sampling Ratios

1 code implementation • 9 Feb 2017 • Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton

We show that an explicit use of importance sampling ratios can be eliminated by varying the amount of bootstrapping in TD updates in an action-dependent manner.

Paper
Code

Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms

no code implementations • 6 May 2016 • Huizhen Yu

This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective.

Paper
Add Code

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

no code implementations • 23 Nov 2015 • Huizhen Yu

In this paper we present convergence results for constrained versions of ETD($\lambda$) with constant stepsize and with diminishing stepsize from a broad range.

Paper
Add Code

Emphatic Temporal-Difference Learning

no code implementations • 6 Jul 2015 • A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

Paper
Add Code

On Convergence of Emphatic Temporal-Difference Learning

no code implementations • 8 Jun 2015 • Huizhen Yu

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.