Off-policy evaluation

74 papers with code • 0 benchmarks • 0 datasets

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems.


Use these libraries to find Off-policy evaluation models and implementations
4 papers

Most implemented papers

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

st-tech/zr-obp 17 Aug 2020

Our dataset is unique in that it contains a set of multiple logged bandit datasets collected by running different policies on the same platform.

Benchmarks for Deep Off-Policy Evaluation

google-research/deep_ope ICLR 2021

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.

Off-Policy Evaluation for Large Action Spaces via Embeddings

st-tech/zr-obp 13 Feb 2022

Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

facebookresearch/ReAgent ICML 2017

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

tobsutter/pmdi_dro NeurIPS 2021

Training models that perform well under distribution shifts is a central challenge in machine learning.

Evaluating the Robustness of Off-Policy Evaluation

st-tech/zr-obp 31 Aug 2021

Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies.

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

st-tech/zr-obp 3 Feb 2022

We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions.

Balanced Off-Policy Evaluation for Personalized Pricing

yzhao3685/pricing-evaluation 24 Feb 2023

We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand.

Off-policy evaluation for slate recommendation

adith387/slates_semisynth_expts NeurIPS 2017

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

LARG/regression-importance-sampling 4 Jun 2018

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.