no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang
To remove the need for an oracle, we first transform the original robust MDPs into an alternative form, as the alternative form allows us to use stochastic gradient methods to solve the robust MDPs.
no code implementations • 12 Sep 2022 • Miao Lu, Wenhao Yang, Liangyu Zhang, Zhihua Zhang
Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure.
no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.
no code implementations • 18 May 2022 • Xiaobo Xia, Wenhao Yang, Jie Ren, Yewen Li, Yibing Zhan, Bo Han, Tongliang Liu
Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well.
1 code implementation • 6 Apr 2022 • Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, Zhihua Zhang
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.
no code implementations • 8 Jan 2022 • Ling-Hao Chen, He Li, Wenhao Yang
In fact, it remains a challenging task to consider all different kinds of interaction actions uniformly and detect anomalous instances in multi-view attributed networks.
no code implementations • 29 Dec 2021 • Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, Michael I. Jordan
We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings.
no code implementations • 9 May 2021 • Wenhao Yang, Liangyu Zhang, Zhihua Zhang
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model.
no code implementations • 31 Oct 2020 • Wenhao Yang, Xiang Li, Guangzeng Xie, Zhihua Zhang
Regularized MDPs serve as a smooth version of original MDPs.
no code implementations • 21 Oct 2019 • Xiang Li, Wenhao Yang, Shusen Wang, Zhihua Zhang
Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication.
2 code implementations • ICLR 2020 • Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang
In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
no code implementations • NeurIPS 2019 • Xiang Li, Wenhao Yang, Zhihua Zhang
We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.
no code implementations • 27 Sep 2018 • YuJun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang
A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation.