Search Results for author: Wenhao Yang

Found 16 papers, 6 papers with code

Accelerated Value Iteration via Anderson Mixing

no code implementations27 Sep 2018 YuJun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang

A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation.

Atari Games Q-Learning +2

A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

no code implementations NeurIPS 2019 Xiang Li, Wenhao Yang, Zhihua Zhang

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.

reinforcement-learning Reinforcement Learning (RL)

On the Convergence of FedAvg on Non-IID Data

2 code implementations ICLR 2020 Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.

Edge-computing Federated Learning

Communication-Efficient Local Decentralized SGD Methods

no code implementations21 Oct 2019 Xiang Li, Wenhao Yang, Shusen Wang, Zhihua Zhang

Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication.

Distributed Computing

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

no code implementations9 May 2021 Wenhao Yang, Liangyu Zhang, Zhihua Zhang

In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model.

A Statistical Analysis of Polyak-Ruppert Averaged Q-learning

1 code implementation29 Dec 2021 Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, Michael I. Jordan

We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings.

Q-Learning

Federated Reinforcement Learning with Environment Heterogeneity

1 code implementation6 Apr 2022 Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, Zhihua Zhang

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.

reinforcement-learning Reinforcement Learning (RL)

Pluralistic Image Completion with Probabilistic Mixture-of-Experts

no code implementations18 May 2022 Xiaobo Xia, Wenhao Yang, Jie Ren, Yewen Li, Yibing Zhan, Bo Han, Tongliang Liu

Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well.

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

no code implementations12 Sep 2022 Miao Lu, Wenhao Yang, Liangyu Zhang, Zhihua Zhang

Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure.

Off-policy evaluation

Robust Markov Decision Processes without Model Estimation

no code implementations2 Feb 2023 Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

Moreover, we prove the alternative form still plays a similar role as the original form.

Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning

1 code implementation29 Apr 2023 Liangyu Zhang, Yang Peng, Wenhao Yang, Zhihua Zhang

To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems.

Decision Making Model-based Reinforcement Learning +1

Non-stationary Projection-free Online Learning with Dynamic and Adaptive Regret Guarantees

no code implementations19 May 2023 Yibo Wang, Wenhao Yang, Wei Jiang, Shiyin Lu, Bing Wang, Haihong Tang, Yuanyu Wan, Lijun Zhang

Specifically, we first provide a novel dynamic regret analysis for an existing projection-free method named $\text{BOGD}_\text{IP}$, and establish an $\mathcal{O}(T^{3/4}(1+P_T))$ dynamic regret bound, where $P_T$ denotes the path-length of the comparator sequence.

Cannot find the paper you are looking for? You can Submit a new open access paper.