1 code implementation • 6 May 2023 • Matej Cief, Jacek Golebiowski, Philipp Schmidt, Ziawasch Abedjan, Artur Bekasov
Off-policy evaluation (OPE) methods allow us to compute the expected reward of a policy by using the logged data collected by a different policy.
no code implementations • 6 Jun 2022 • Matej Cief, Branislav Kveton, Michal Kompan
Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy.