no code implementations • 3 Feb 2023 • Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song
Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty.
no code implementations • 30 Jan 2023 • Xiaohong Chen, Zhengling Qi, Runzhe Wan
In this paper, we propose a new batch RL algorithm without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions.
no code implementations • 30 Dec 2022 • Ye Shen, Runzhe Wan, Hengrui Cai, Rui Song
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications.
no code implementations • 25 Dec 2022 • Runzhe Wan, YingYing Li, Wenbin Lu, Rui Song
Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis.
no code implementations • 26 Feb 2022 • Runzhe Wan, Branislav Kveton, Rui Song
High-quality data plays a central role in ensuring the accuracy of policy evaluation.
no code implementations • 26 Feb 2022 • Runzhe Wan, Lin Ge, Rui Song
In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the parameter space can be factorized to item-level.
1 code implementation • 21 Feb 2022 • Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu
In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time.
no code implementations • NeurIPS 2021 • Runzhe Wan, Lin Ge, Rui Song
How to explore efficiently is a central problem in multi-armed bandits.
no code implementations • 27 May 2021 • Runzhe Wan, Sheng Zhang, Chengchun Shi, Shikai Luo, Rui Song
Order dispatch is one of the central problems to ride-sharing platforms.
1 code implementation • 10 May 2021 • Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song
Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy.
no code implementations • 9 Sep 2020 • Runzhe Wan, Xin-Yu Zhang, Rui Song
Severe infectious diseases such as the novel coronavirus (COVID-19) pose a huge threat to public health.
no code implementations • 23 Jul 2020 • Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy
The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.
1 code implementation • ICML 2020 • Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, Ling Leng
The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning.