Off-policy evaluation

80 papers with code • 0 benchmarks • 0 datasets

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems.

Benchmarks

Add a Result

These leaderboards are used to track progress in Off-policy evaluation

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Off-policy evaluation models and implementations

st-tech/zr-obp

4 papers

615

Most implemented papers

Most implemented Social Latest No code

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

st-tech/zr-obp • 17 Aug 2020

Our dataset is unique in that it contains a set of multiple logged bandit datasets collected by running different policies on the same platform.

Paper
Code

Benchmarks for Deep Off-Policy Evaluation

google-research/deep_ope • • ICLR 2021

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.

Paper
Code

Off-Policy Evaluation for Large Action Spaces via Embeddings

st-tech/zr-obp • 13 Feb 2022

Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.

Paper
Code

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

facebookresearch/ReAgent • • ICML 2017

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.

Paper
Code

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

tobsutter/pmdi_dro • NeurIPS 2021

Training models that perform well under distribution shifts is a central challenge in machine learning.

Paper
Code

Evaluating the Robustness of Off-Policy Evaluation

st-tech/zr-obp • 31 Aug 2021

Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies.

Paper
Code

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

st-tech/zr-obp • 3 Feb 2022

We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions.

Paper
Code

Balanced Off-Policy Evaluation for Personalized Pricing

yzhao3685/pricing-evaluation • 24 Feb 2023

We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand.

Paper
Code

Off-policy evaluation for slate recommendation

adith387/slates_semisynth_expts • NeurIPS 2017

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.

Paper
Code

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

LARG/regression-importance-sampling • 4 Jun 2018

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Paper
Code

Off-policy evaluation

Benchmarks Add a Result

Libraries

Most implemented papers

Content

Benchmarks

Add a Result