no code implementations • 23 Dec 2022 • Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang
To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob.
no code implementations • 18 Sep 2022 • Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok
Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment.
1 code implementation • 24 Oct 2021 • Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems.
no code implementations • 19 Aug 2021 • Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set.
no code implementations • 19 Feb 2021 • Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Mladen Kolar, Zhaoran Wang
Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action.
no code implementations • ICLR 2021 • Zuyue Fu, Zhuoran Yang, Zhaoran Wang
To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
no code implementations • ICLR 2020 • Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang
We study discrete-time mean-field Markov games with infinite numbers of agents where each agent aims to minimize its ergodic cost.
1 code implementation • 8 Oct 2019 • Jiaheng Wei, Zuyue Fu, Yang Liu, Xingyu Li, Zhuoran Yang, Zhaoran Wang
We also show a connection between this sample elicitation problem and $f$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples.
no code implementations • 25 Sep 2019 • Yang Liu, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
While classical elicitation results apply to eliciting a complex and generative (and continuous) distribution $p(x)$ for this image data, we are interested in eliciting samples $x_i \sim p(x)$ from agents.
no code implementations • 27 Sep 2018 • Zhuoran Yang, Zuyue Fu, Kaiqing Zhang, Zhaoran Wang
We study reinforcement learning algorithms with nonlinear function approximation in the online setting.