Search Results for author: Yaqi Duan

Found 14 papers, 2 papers with code

Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

no code implementations10 Jan 2024 Yaqi Duan, Martin J. Wainwright

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings.

reinforcement-learning Reinforcement Learning (RL) +1

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

no code implementations7 Nov 2022 Yaqi Duan, Martin J. Wainwright

For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i. i. d.

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

no code implementations11 Mar 2022 Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang

However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear.

Decision Making reinforcement-learning +1

Adaptive and Robust Multi-Task Learning

1 code implementation10 Feb 2022 Yaqi Duan, Kaizheng Wang

We study the multi-task learning problem that aims to simultaneously analyze multiple datasets collected from different sources and learn one model for each of them.

Multi-Task Learning

Optimal policy evaluation using kernel-based temporal difference methods

no code implementations24 Sep 2021 Yaqi Duan, Mengdi Wang, Martin J. Wainwright

Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.

PU-Flow: a Point Cloud Upsampling Network with Normalizing Flows

1 code implementation13 Jul 2021 Aihua Mao, Zihui Du, Junhui Hou, Yaqi Duan, Yong-Jin Liu, Ying He

Point cloud upsampling aims to generate dense point clouds from given sparse ones, which is a challenging task due to the irregular and unordered nature of point sets.

point cloud upsampling

Learning Good State and Action Representations via Tensor Decomposition

no code implementations3 May 2021 Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.

Tensor Decomposition

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

no code implementations25 Mar 2021 Yaqi Duan, Chi Jin, Zhiyuan Li

Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class.

Learning Theory reinforcement-learning +1

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

no code implementations6 Feb 2021 Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Off-policy evaluation

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

no code implementations ICML 2020 Yaqi Duan, Mengdi Wang

We prove that this method is information-theoretically optimal and has nearly minimal estimation error.

Off-policy evaluation

Learning low-dimensional state embeddings and metastable clusters from time series data

no code implementations NeurIPS 2019 Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.

Clustering Time Series +1

State Aggregation Learning from Markov Transition Data

no code implementations NeurIPS 2019 Yaqi Duan, Zheng Tracy Ke, Mengdi Wang

Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.

Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains

no code implementations14 Oct 2018 Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan

The efficiency and statistical properties of our approach are illustrated on synthetic data.

Cannot find the paper you are looking for? You can Submit a new open access paper.