Search Results for author: Matej Cief

Learning Action Embeddings for Off-Policy Evaluation

Off-policy evaluation (OPE) methods allow us to compute the expected reward of a policy by using the logged data collected by a different policy.

Paper
Code

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.