no code implementations • 1 Dec 2022 • Jinghan Wang, Mengdi Wang, Lin F. Yang
This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).