1 code implementation • 6 Apr 2023 • Siow Meng Low, Akshat Kumar, Scott Sanner
In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects.
no code implementations • 23 Mar 2022 • Siow Meng Low, Akshat Kumar, Scott Sanner
This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity.