We investigate the sample efficiency of reinforcement learning in a $\gamma$-discounted infinite-horizon Markov decision process (MDP) with state space $\mathcal{S}$ and action space $\mathcal{A}$, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined... (read more)

PDF
Add Datasets
introduced or used in this paper

Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.

METHOD | TYPE | |
---|---|---|

🤖 No Methods Found | Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet |