no code implementations • 6 Dec 2021 • Michael James McDonald, Dylan Hadfield-Menell
While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals.