3 code implementations • 29 May 2019 • Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, Craig Boutilier
(i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates.