no code implementations • 29 May 2018 • Henrik Aslund, El Mahdi El Mhamdi, Rachid Guerraoui, Alexandre Maurer
We show that when a third party, the adversary, steps into the two-party setting (agent and operator) of safely interruptible reinforcement learning, a trade-off has to be made between the probability of following the optimal policy in the limit, and the probability of escaping a dangerous situation created by the adversary.