1 code implementation • 2 Mar 2023 • Jonas Rothfuss, Bhavya Sukhija, Tobias Birchler, Parnian Kassraie, Andreas Krause
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, collected by other agents, we seek to obtain a (tight) lower bound on a policy's performance.