1 code implementation • 30 Nov 2023 • Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito
Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online policy deployment.
1 code implementation • 30 Nov 2023 • Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito
This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS).
no code implementations • 17 Sep 2021 • Haruka Kiyohara, Kosuke Kawakami, Yuta Saito
In this position paper, we explore the potential of using simulation to accelerate practical research of offline RL and OPE, particularly in RecSys and RTB.