no code implementations • 7 Jan 2024 • Evan Ryan Gunter, Yevgeny Liokumovich, Victoria Krakovna
In our first case of interest--near-optimal policies--we use a bisimulation metric on MDPs to prove that small perturbations won't make the agent take longer to shut down.