Search Results for author: Ashique Rupam Mahmood

Found 1 papers, 1 papers with code

Multi-step Off-policy Learning Without Importance Sampling Ratios

1 code implementation9 Feb 2017 Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton

We show that an explicit use of importance sampling ratios can be eliminated by varying the amount of bootstrapping in TD updates in an action-dependent manner.

Cannot find the paper you are looking for? You can Submit a new open access paper.