TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Causal Inference	IDHP		Average Treatment Effect Error	-0.225	# 3
Visual Object Tracking	VOT2014		Expected Average Overlap (EAO)	1.047	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-counterfactual-learning-from-bandit/visual-object-tracking-on-vot2014)](https://paperswithcode.com/sota/visual-object-tracking-on-vot2014?p=efficient-counterfactual-learning-from-bandit)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-counterfactual-learning-from-bandit/causal-inference-on-idhp)](https://paperswithcode.com/sota/causal-inference-on-idhp?p=efficient-counterfactual-learning-from-bandit)`

Efficient Counterfactual Learning from Bandit Feedback

10 Sep 2018 · Yusuke Narita, Shota Yasui, Kohei Yata ·

What is the most statistically efficient way to do off-policy evaluation and optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark.

PDF Abstract