no code implementations • 2 Feb 2024 • Sungee Hong, Zhengling Qi, Raymond K. W. Wong
We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms.
1 code implementation • 28 Oct 2023 • Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications.
no code implementations • 14 Jun 2023 • Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang
This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated.
no code implementations • 26 May 2023 • Mao Hong, Zhengling Qi, Yanxun Xu
To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.
no code implementations • 24 Mar 2023 • Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber
In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge.
no code implementations • 24 Feb 2023 • Rui Miao, Zhengling Qi, Cong Shi, Lin Lin
Specifically, relying on the structural models of revenue and price, we establish the identifiability condition of an optimal pricing strategy under endogeneity with the help of invalid instrumental variables.
no code implementations • 8 Feb 2023 • Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, Ethan X. Fang, Vahid Tarokh
The objective is to use the offline dataset to find an optimal assortment.
no code implementations • 30 Jan 2023 • Xiaohong Chen, Zhengling Qi, Runzhe Wan
Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment.
no code implementations • 5 Jan 2023 • Chengchun Shi, Zhengling Qi, Jianing Wang, Fan Zhou
When the initial policy is consistent, under some mild conditions, our method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired ``value enhancement" property.
no code implementations • 23 Dec 2022 • Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang
To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob.
1 code implementation • 12 Nov 2022 • Xiaoqing Tan, Zhengling Qi, Christopher W. Seymour, Lu Tang
This paper introduces RISE, a robust individualized decision learning framework with sensitive variables, where sensitive variables are collectible data and important to the intervention decision, but their inclusion in decision making is prohibited due to reasons such as delayed availability or fairness concerns.
1 code implementation • 26 Oct 2022 • Yunzhe Zhou, Zhengling Qi, Chengchun Shi, Lexin Li
In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting.
no code implementations • 29 Sep 2022 • Jiayi Wang, Zhengling Qi, Chengchun Shi
This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI).
no code implementations • 21 Sep 2022 • Rui Miao, Zhengling Qi, Xiaoke Zhang
We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states.
no code implementations • 18 Sep 2022 • Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok
Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment.
no code implementations • 17 Jan 2022 • Xiaohong Chen, Zhengling Qi
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions.
no code implementations • 29 Nov 2021 • Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
no code implementations • 17 Oct 2021 • Weibin Mo, Zhengling Qi, Yufeng Liu
However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore.
no code implementations • 10 Sep 2021 • Jiayi Wang, Zhengling Qi, Raymond K. W. Wong
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL).
no code implementations • 3 May 2021 • Zhengling Qi, Rui Miao, Xiaoke Zhang
Data-driven individualized decision making has recently received increasing research interests.
no code implementations • 9 Nov 2020 • Zhengling Qi, Peng Liao
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP).
no code implementations • 23 Jul 2020 • Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy
The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.
no code implementations • 26 Jun 2020 • Weibin Mo, Zhengling Qi, Yufeng Liu
We propose a novel distributionally robust ITR (DR-ITR) framework that maximizes the worst-case value function across the values under a set of underlying distributions that are "close" to the training distribution.
no code implementations • 6 Oct 2019 • Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang
This paper has two main goals: (a) establish several statistical properties---consistency, asymptotic distributions, and convergence rates---of stationary solutions and values of a class of coupled nonconvex and nonsmoothempirical risk minimization problems, and (b) validate these properties by a noisy amplitude-based phase retrieval problem, the latter being of much topical interest. Derived from available data via sampling, these empirical risk minimization problems are the computational workhorse of a population risk model which involves the minimization of an expected value of a random functional.
no code implementations • 27 Aug 2019 • Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang
Recent exploration of optimal individualized decision rules (IDRs) for patients in precision medicine has attracted a lot of attention due to the heterogeneous responses of patients to different treatments.