Search Results for author: Haruka Kiyohara

Found 10 papers, 9 papers with code

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

1 code implementation • 3 Feb 2024 • Haruka Kiyohara, Masahiro Nomura, Yuta Saito

The PseudoInverse (PI) estimator has been introduced to mitigate the variance issue by assuming linearity in the reward function, but this can result in significant bias as this assumption is hard-to-verify from observed data and is often substantially violated.

Marketing Multi-Armed Bandits +2

Paper
Code

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

1 code implementation • 30 Nov 2023 • Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online policy deployment.

Benchmarking counterfactual +1

Paper
Code

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

1 code implementation • 30 Nov 2023 • Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS).

Offline RL Off-policy evaluation

Paper
Code

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

1 code implementation • 26 Jun 2023 • Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior.

Off-policy evaluation

Paper
Code

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

1 code implementation • 25 Nov 2022 • Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level.

counterfactual Off-policy evaluation

Paper
Code

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

1 code implementation • NeurIPS 2023 • Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Off-policy evaluation

Paper
Code

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

2 code implementations • 3 Feb 2022 • Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto

We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions.

Multi-Armed Bandits Off-policy evaluation +1

614

Paper
Code

Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation

no code implementations • 17 Sep 2021 • Haruka Kiyohara, Kosuke Kawakami, Yuta Saito

In this position paper, we explore the potential of using simulation to accelerate practical research of offline RL and OPE, particularly in RecSys and RTB.

Decision Making Offline RL +4

Paper
Add Code

Evaluating the Robustness of Off-Policy Evaluation

2 code implementations • 31 Aug 2021 • Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, Kei Tateno

Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies.

Off-policy evaluation Recommendation Systems

614

Paper
Code

Constrained Generalized Additive 2 Model with Consideration of High-Order Interactions

1 code implementation • 5 Jun 2021 • Akihisa Watanabe, Michiya Kuramata, Kaito Majima, Haruka Kiyohara, Kensho Kondo, Kazuhide Nakata

The second is the introduction of a higher-order term: given that GA2M considers only second-order interactions, we aim to balance interpretability and prediction accuracy by introducing a higher-order term that can capture higher-order interactions.

Autonomous Driving Vocal Bursts Intensity Prediction

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.