no code implementations • 23 Oct 2020 • Masahiro Kato, Yusuke Kaneko
The goal of off-policy evaluation (OPE) is to evaluate a new policy using historical data obtained via a behavior policy.
no code implementations • 4 Jul 2020 • Kenshi Abe, Yusuke Kaneko
The proposed estimators project exploitability that is often used as a metric for determining how close a policy profile (i. e., a tuple of policies) is to a Nash equilibrium in two-player zero-sum games.