Search Results for author: Huizhen Yu

Found 8 papers, 1 papers with code

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

no code implementations27 Dec 2017 Huizhen Yu

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation.

On Generalized Bellman Equations and Temporal-Difference Learning

no code implementations14 Apr 2017 Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton

As to its soundness, using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process.

Multi-step Off-policy Learning Without Importance Sampling Ratios

1 code implementation9 Feb 2017 Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton

We show that an explicit use of importance sampling ratios can be eliminated by varying the amount of bootstrapping in TD updates in an action-dependent manner.

Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms

no code implementations6 May 2016 Huizhen Yu

This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective.

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

no code implementations23 Nov 2015 Huizhen Yu

In this paper we present convergence results for constrained versions of ETD($\lambda$) with constant stepsize and with diminishing stepsize from a broad range.

Emphatic Temporal-Difference Learning

no code implementations6 Jul 2015 A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

On Convergence of Emphatic Temporal-Difference Learning

no code implementations8 Jun 2015 Huizhen Yu

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces.

Cannot find the paper you are looking for? You can Submit a new open access paper.