Search Results for author: Zhengling Qi

Found 25 papers, 3 papers with code

Distributional Off-policy Evaluation with Bellman Residual Minimization

no code implementations2 Feb 2024 Sungee Hong, Zhengling Qi, Raymond K. W. Wong

We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms.

Distributional Reinforcement Learning Off-policy evaluation

Robust Offline Reinforcement learning with Heavy-Tailed Rewards

1 code implementation28 Oct 2023 Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications.

Offline RL Off-policy evaluation +1

Off-policy Evaluation in Doubly Inhomogeneous Environments

no code implementations14 Jun 2023 Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang

This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated.

Offline RL Off-policy evaluation

A Policy Gradient Method for Confounded POMDPs

no code implementations26 May 2023 Mao Hong, Zhengling Qi, Yanxun Xu

To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.

Sequential Knockoffs for Variable Selection in Reinforcement Learning

no code implementations24 Mar 2023 Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber

In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge.

reinforcement-learning Variable Selection

Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

no code implementations24 Feb 2023 Rui Miao, Zhengling Qi, Cong Shi, Lin Lin

Specifically, relying on the structural models of revenue and price, we establish the identifiability condition of an optimal pricing strategy under endogeneity with the help of invalid instrumental variables.

Causal Inference Econometrics

PASTA: Pessimistic Assortment Optimization

no code implementations8 Feb 2023 Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, Ethan X. Fang, Vahid Tarokh

The objective is to use the offline dataset to find an optimal assortment.

STEEL: Singularity-aware Reinforcement Learning

no code implementations30 Jan 2023 Xiaohong Chen, Zhengling Qi, Runzhe Wan

Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment.

Off-policy evaluation reinforcement-learning

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

no code implementations5 Jan 2023 Chengchun Shi, Zhengling Qi, Jianing Wang, Fan Zhou

When the initial policy is consistent, under some mild conditions, our method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired ``value enhancement" property.

Decision Making reinforcement-learning +1

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

no code implementations23 Dec 2022 Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang

To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob.

Decision Making Off-policy evaluation +1

RISE: Robust Individualized Decision Learning with Sensitive Variables

1 code implementation12 Nov 2022 Xiaoqing Tan, Zhengling Qi, Christopher W. Seymour, Lu Tang

This paper introduces RISE, a robust individualized decision learning framework with sensitive variables, where sensitive variables are collectible data and important to the intervention decision, but their inclusion in decision making is prohibited due to reasons such as delayed availability or fairness concerns.

Decision Making Fairness

Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach

1 code implementation26 Oct 2022 Yunzhe Zhou, Zhengling Qi, Chengchun Shi, Lexin Li

In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting.

Thompson Sampling Variational Inference

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

no code implementations29 Sep 2022 Jiayi Wang, Zhengling Qi, Chengchun Shi

This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI).

Decision Making reinforcement-learning +1

Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models

no code implementations21 Sep 2022 Rui Miao, Zhengling Qi, Xiaoke Zhang

We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states.

Causal Inference Off-policy evaluation

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

no code implementations18 Sep 2022 Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok

Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment.

Offline RL reinforcement-learning +1

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

no code implementations17 Jan 2022 Xiaohong Chen, Zhengling Qi

We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions.

Off-policy evaluation

Pessimistic Model Selection for Offline Deep Reinforcement Learning

no code implementations29 Nov 2021 Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.

Decision Making Model Selection +2

Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

no code implementations17 Oct 2021 Weibin Mo, Zhengling Qi, Yufeng Liu

However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore.

Projected State-action Balancing Weights for Offline Reinforcement Learning

no code implementations10 Sep 2021 Jiayi Wang, Zhengling Qi, Raymond K. W. Wong

Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL).

Causal Inference reinforcement-learning +1

Robust Batch Policy Learning in Markov Decision Processes

no code implementations9 Nov 2020 Zhengling Qi, Peng Liao

We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP).

Decision Making

Batch Policy Learning in Average Reward Markov Decision Processes

no code implementations23 Jul 2020 Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy

The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.

Learning Optimal Distributionally Robust Individualized Treatment Rules

no code implementations26 Jun 2020 Weibin Mo, Zhengling Qi, Yufeng Liu

We propose a novel distributionally robust ITR (DR-ITR) framework that maximizes the worst-case value function across the values under a set of underlying distributions that are "close" to the training distribution.

Decision Making

Statistical Analysis of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

no code implementations6 Oct 2019 Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang

This paper has two main goals: (a) establish several statistical properties---consistency, asymptotic distributions, and convergence rates---of stationary solutions and values of a class of coupled nonconvex and nonsmoothempirical risk minimization problems, and (b) validate these properties by a noisy amplitude-based phase retrieval problem, the latter being of much topical interest. Derived from available data via sampling, these empirical risk minimization problems are the computational workhorse of a population risk model which involves the minimization of an expected value of a random functional.

Retrieval

Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes

no code implementations27 Aug 2019 Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang

Recent exploration of optimal individualized decision rules (IDRs) for patients in precision medicine has attracted a lot of attention due to the heterogeneous responses of patients to different treatments.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.