Search Results for author: Runzhe Wan

Found 17 papers, 3 papers with code

Batch Policy Learning in Average Reward Markov Decision Processes

no code implementations23 Jul 2020 Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy

The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.

Deeply-Debiased Off-Policy Interval Estimation

1 code implementation10 May 2021 Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy.

Off-policy evaluation

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

1 code implementation21 Feb 2022 Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu

In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time.

Management Multi-agent Reinforcement Learning +1

Safe Exploration for Efficient Policy Evaluation and Comparison

no code implementations26 Feb 2022 Runzhe Wan, Branislav Kveton, Rui Song

High-quality data plays a central role in ensuring the accuracy of policy evaluation.

Safe Exploration

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

no code implementations26 Feb 2022 Runzhe Wan, Lin Ge, Rui Song

In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the parameter space can be factorized to item-level.

Meta-Learning Thompson Sampling

Mining the Factor Zoo: Estimation of Latent Factor Models with Sufficient Proxies

no code implementations25 Dec 2022 Runzhe Wan, YingYing Li, Wenbin Lu, Rui Song

Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis.

regression

Heterogeneous Synthetic Learner for Panel Data

no code implementations30 Dec 2022 Ye Shen, Runzhe Wan, Hengrui Cai, Rui Song

In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications.

STEEL: Singularity-aware Reinforcement Learning

no code implementations30 Jan 2023 Xiaohong Chen, Zhengling Qi, Runzhe Wan

Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment.

Off-policy evaluation reinforcement-learning

Multiplier Bootstrap-based Exploration

no code implementations3 Feb 2023 Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song

Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty.

Multi-Armed Bandits

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring

no code implementations2 Apr 2023 Runzhe Wan, Yu Liu, James McQueen, Doug Hains, Rui Song

With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible.

Decision Making reinforcement-learning

Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards

no code implementations28 Oct 2023 Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications.

Offline RL Off-policy evaluation

Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

no code implementations20 Dec 2023 Yu Liu, Runzhe Wan, James McQueen, Doug Hains, Jinxiang Gu, Rui Song

The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency.

Decision Making

Zero-Inflated Bandits

no code implementations25 Dec 2023 Haoyu Wei, Runzhe Wan, Lei Shi, Rui Song

Many real applications of bandits have sparse non-zero rewards, leading to slow learning rates.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.