Search Results for author: Brahma Pavse

Found 1 papers, 0 papers with code

Reducing Sampling Error in Batch Temporal Difference Learning

no code implementations ICML 2020 Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone

In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.

Cannot find the paper you are looking for? You can Submit a new open access paper.