no code implementations • 10 Jan 2024 • Yaqi Duan, Martin J. Wainwright
We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings.
no code implementations • 7 Nov 2022 • Yaqi Duan, Martin J. Wainwright
For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i. i. d.
no code implementations • 11 Mar 2022 • Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang
However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear.
1 code implementation • 10 Feb 2022 • Yaqi Duan, Kaizheng Wang
We study the multi-task learning problem that aims to simultaneously analyze multiple datasets collected from different sources and learn one model for each of them.
no code implementations • 24 Sep 2021 • Yaqi Duan, Mengdi Wang, Martin J. Wainwright
Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.
1 code implementation • 13 Jul 2021 • Aihua Mao, Zihui Du, Junhui Hou, Yaqi Duan, Yong-Jin Liu, Ying He
Point cloud upsampling aims to generate dense point clouds from given sparse ones, which is a challenging task due to the irregular and unordered nature of point sets.
no code implementations • 3 May 2021 • Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.
no code implementations • 25 Mar 2021 • Yaqi Duan, Chi Jin, Zhiyuan Li
Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class.
no code implementations • 6 Feb 2021 • Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.
no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.
no code implementations • ICML 2020 • Yaqi Duan, Mengdi Wang
We prove that this method is information-theoretically optimal and has nearly minimal estimation error.
no code implementations • NeurIPS 2019 • Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang
This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.
no code implementations • NeurIPS 2019 • Yaqi Duan, Zheng Tracy Ke, Mengdi Wang
Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.
no code implementations • 14 Oct 2018 • Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan
The efficiency and statistical properties of our approach are illustrated on synthetic data.