Search Results for author: Wenhao Yang

Found 16 papers, 6 papers with code

Accelerated Value Iteration via Anderson Mixing

no code implementations • 27 Sep 2018 • YuJun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang

A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation.

Atari Games Q-Learning +2

Paper
Add Code

A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

no code implementations • NeurIPS 2019 • Xiang Li, Wenhao Yang, Zhihua Zhang

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On the Convergence of FedAvg on Non-IID Data

2 code implementations • ICLR 2020 • Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.

Edge-computing Federated Learning

237

Paper
Code

Communication-Efficient Local Decentralized SGD Methods

no code implementations • 21 Oct 2019 • Xiang Li, Wenhao Yang, Shusen Wang, Zhihua Zhang

Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication.

Distributed Computing

Paper
Add Code

Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs

no code implementations • 31 Oct 2020 • Wenhao Yang, Xiang Li, Guangzeng Xie, Zhihua Zhang

Regularized MDPs serve as a smooth version of original MDPs.

Paper
Add Code

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

no code implementations • 9 May 2021 • Wenhao Yang, Liangyu Zhang, Zhihua Zhang

In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model.

Paper
Add Code

A Statistical Analysis of Polyak-Ruppert Averaged Q-learning

1 code implementation • 29 Dec 2021 • Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, Michael I. Jordan

We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings.

Q-Learning

Paper
Code

Federated Reinforcement Learning with Environment Heterogeneity

1 code implementation • 6 Apr 2022 • Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, Zhihua Zhang

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Pluralistic Image Completion with Probabilistic Mixture-of-Experts

no code implementations • 18 May 2022 • Xiaobo Xia, Wenhao Yang, Jie Ren, Yewen Li, Yibing Zhan, Bo Han, Tongliang Liu

Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well.

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

no code implementations • 12 Sep 2022 • Miao Lu, Wenhao Yang, Liangyu Zhang, Zhihua Zhang

Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure.

Off-policy evaluation

Paper
Add Code

Robust Markov Decision Processes without Model Estimation

no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

Moreover, we prove the alternative form still plays a similar role as the original form.

Paper
Add Code

Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning

1 code implementation • 29 Apr 2023 • Liangyu Zhang, Yang Peng, Wenhao Yang, Zhihua Zhang

To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems.

Decision Making Model-based Reinforcement Learning +1

Paper
Code

Non-stationary Projection-free Online Learning with Dynamic and Adaptive Regret Guarantees

no code implementations • 19 May 2023 • Yibo Wang, Wenhao Yang, Wei Jiang, Shiyin Lu, Bing Wang, Haihong Tang, Yuanyu Wan, Lijun Zhang

Specifically, we first provide a novel dynamic regret analysis for an existing projection-free method named $\text{BOGD}_\text{IP}$, and establish an $\mathcal{O}(T^{3/4}(1+P_T))$ dynamic regret bound, where $P_T$ denotes the path-length of the comparator sequence.

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

Estimation and Inference in Distributional Reinforcement Learning

1 code implementation • 29 Sep 2023 • Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang

This implies the distributional policy evaluation problem can be solved with sample efficiency.

Distributional Reinforcement Learning reinforcement-learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.