1 code implementation • 13 Jan 2020 • C. Shi, S. Zhang, W. Lu, R. Song
We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval.
Decision Making reinforcement-learning +1