1 code implementation • 31 Dec 2020 • Andrew Warrington, J. Wilder Lavington, Adam Ścibior, Mark Schmidt, Frank Wood
Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes.