no code implementations • 10 Jan 2025 • Jitao Wang, Chengchun Shi, John D. Piette, Joshua R. Loftus, Donglin Zeng, Zhenke Wu
When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit.
1 code implementation • 8 Dec 2024 • Shuguang Yu, Shuxing Fang, Ruixin Peng, Zhengling Qi, Fan Zhou, Chengchun Shi
This paper studies off-policy evaluation (OPE) in the presence of unmeasured confounders.
no code implementations • 3 Oct 2024 • Pangpang Liu, Chengchun Shi, Will Wei Sun
Through simulations and experiments on LLMs, we demonstrate the effectiveness of our algorithm and its superiority over state-of-the-arts.
1 code implementation • 9 Aug 2024 • Ke Sun, Linglong Kong, Hongtu Zhu, Chengchun Shi
This paper studies the optimal design for A/B testing in partially observable online experiments.
1 code implementation • 25 Jul 2024 • Runpeng Dai, Jianing Wang, Fan Zhou, Shikai Luo, Zhiwei Qin, Chengchun Shi, Hongtu Zhu
Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets.
1 code implementation • 27 Jun 2024 • Meiling Hao, Pingfan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao, Chengchun Shi
Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment.
1 code implementation • 1 Jun 2024 • Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu
This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm.
no code implementations • 26 Mar 2024 • Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu
Our aim is to thoroughly evaluate the effects of these designs on the accuracy of their resulting average treatment effect (ATE) estimators.
no code implementations • 18 Mar 2024 • Danyang Wang, Chengchun Shi, Shikai Luo, Will Wei Sun
As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning.
1 code implementation • 28 Oct 2023 • Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications.
1 code implementation • 14 Jun 2023 • Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang
This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated.
1 code implementation • 30 May 2023 • Yunzhe Zhou, Chengchun Shi, Lexin Li, Qiwei Yao
In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning.
no code implementations • 17 May 2023 • Ting Li, Chengchun Shi, Zhaohua Lu, Yi Li, Hongtu Zhu
However, assessing dynamic quantile treatment effects (QTE) remains a challenge, particularly when dealing with data from ride-sourcing platforms that involve sequential decision-making across time and space.
no code implementations • 24 Mar 2023 • Tao Ma, Jin Zhu, Hengrui Cai, Zhengling Qi, Yunxiao Chen, Chengchun Shi, Eric B. Laber
In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge.
1 code implementation • 31 Jan 2023 • Lin Ge, Jitao Wang, Chengchun Shi, Zhenke Wu, Rui Song
However, there are a number of applications (e. g., mobile health) where the treatments are sequentially assigned over time and the dynamic mediation effects are of primary interest.
no code implementations • 5 Jan 2023 • Chengchun Shi, Zhengling Qi, Jianing Wang, Fan Zhou
When the initial policy is consistent, under some mild conditions, our method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired ``value enhancement" property.
no code implementations • 3 Jan 2023 • Yuhe Gao, Chengchun Shi, Rui Song
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates.
no code implementations • 29 Dec 2022 • Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, Rui Song
Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy.
no code implementations • 29 Dec 2022 • Yang Xu, Chengchun Shi, Shikai Luo, Lan Wang, Rui Song
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
no code implementations • 13 Dec 2022 • Masatoshi Uehara, Chengchun Shi, Nathan Kallus
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems.
1 code implementation • 8 Nov 2022 • Liyuan Hu, Mengbing Li, Chengchun Shi, Zhenke Wu, Piotr Fryzlewicz
Moreover, by borrowing information over time and population, it allows us to detect weaker signals and has better convergence properties when compared to applying the clustering algorithm per time or the change point detection algorithm per subject.
1 code implementation • 26 Oct 2022 • Yunzhe Zhou, Zhengling Qi, Chengchun Shi, Lexin Li
In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting.
no code implementations • 29 Sep 2022 • Jiayi Wang, Zhengling Qi, Chengchun Shi
This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI).
no code implementations • 15 Sep 2022 • Gholamali Aminian, Armin Behnamnia, Roberto Vega, Laura Toni, Chengchun Shi, Hamid R. Rabiee, Omar Rivasplata, Miguel R. D. Rodrigues
We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedback and samples missing-feedback in the logged data.
1 code implementation • NeurIPS 2023 • Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun
Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.
1 code implementation • 14 Jun 2022 • Yingying Zhang, Chengchun Shi, Shikai Luo
Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment.
1 code implementation • 3 Mar 2022 • Mengbing Li, Chengchun Shi, Zhenke Wu, Piotr Fryzlewicz
Based on the proposed test, we further develop a sequential change point detection method that can be naturally coupled with existing state-of-the-art RL methods for policy optimization in nonstationary environments.
1 code implementation • 26 Feb 2022 • Chengchun Shi, Shikai Luo, Yuan Le, Hongtu Zhu, Rui Song
We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications.
1 code implementation • 22 Feb 2022 • Chengchun Shi, Jin Zhu, Ye Shen, Shikai Luo, Hongtu Zhu, Rui Song
In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process.
no code implementations • 22 Feb 2022 • Shikai Luo, Ying Yang, Chengchun Shi, Fang Yao, Jieping Ye, Hongtu Zhu
The aim of this paper is to establish a causal link between the policies implemented by technology companies and the outcomes they yield within intricate temporal and/or spatial dependent experiments.
1 code implementation • 21 Feb 2022 • Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu
In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time.
no code implementations • 17 Nov 2021 • Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu
To derive an optimal I2DR, our jump interval-learning method estimates the conditional mean of the outcome given the treatment and the covariates via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated outcome regression function.
1 code implementation • 12 Nov 2021 • Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang
In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution.
1 code implementation • 2 Jun 2021 • Chengchun Shi, Yunzhe Zhou, Lexin Li
In this article, we propose a new hypothesis testing method for directed acyclic graph (DAG).
no code implementations • 27 May 2021 • Runzhe Wan, Sheng Zhang, Chengchun Shi, Shikai Luo, Rui Song
Order dispatch is one of the central problems to ride-sharing platforms.
1 code implementation • 10 May 2021 • Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song
Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy.
no code implementations • 1 Jan 2021 • Chengchun Shi, Xiaoyu Wang, Shikai Luo, Rui Song, Hongtu Zhu, Jieping Ye
A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
1 code implementation • NeurIPS 2021 • Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu
To handle continuous treatments, we develop a novel estimation method for OPE using deep jump learning.
no code implementations • 28 Sep 2020 • Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu
To handle continuous action space, we develop a brand-new deep jump Q-evaluation method for OPE.
1 code implementation • 3 Jun 2020 • Chengchun Shi, Tianlin Xu, Wicher Bergsma, Lexin Li
In this article, we study the problem of high-dimensional conditional independence testing, a key building block in statistics and machine learning.
1 code implementation • 5 Feb 2020 • Chengchun Shi, Xiaoyu Wang, Shikai Luo, Hongtu Zhu, Jieping Ye, Rui Song
A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
1 code implementation • ICML 2020 • Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, Ling Leng
The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning.
no code implementations • 15 Oct 2015 • Chengchun Shi, Rui Song, Wenbin Lu
In this paper, we propose a two-step estimation procedure for deriving the optimal treatment regime under NP dimensionality.