Search Results for author: Zhuoran Yang

Found 113 papers, 13 papers with code

Breaking the Curse of Many Agents: Provable Mean Embedding $Q$-Iteration for Mean-Field Reinforcement Learning

no code implementations ICML 2020 Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

We highlight that MF-FQI algorithm enjoys a ``blessing of many agents'' property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

Multi-agent Reinforcement Learning reinforcement-learning

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

no code implementations5 May 2022 Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang

In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms.

Offline RL reinforcement-learning

Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations

no code implementations20 Apr 2022 Qi Cai, Zhuoran Yang, Zhaoran Wang

In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite.

reinforcement-learning

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

no code implementations7 Mar 2022 Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang

We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market.

reinforcement-learning

The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches

no code implementations3 Mar 2022 Grigoris Velegkas, Zhuoran Yang, Amin Karbasi

In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting.

reinforcement-learning

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

no code implementations25 Feb 2022 Boxiang Lyu, Qinglin Meng, Shuang Qiu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan

Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment.

reinforcement-learning

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

1 code implementation ICLR 2022 Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang

We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL.

Offline RL reinforcement-learning

Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

no code implementations22 Feb 2022 Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan, Haifeng Xu

This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions.

reinforcement-learning

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

no code implementations15 Feb 2022 Han Zhong, Wei Xiong, Jiyuan Tan, LiWei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang

When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving.

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

no code implementations28 Jan 2022 YiXuan Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e. g., safety, stability) under the learned controller.

Bilevel Optimization Model-based Reinforcement Learning +1

Exponential Family Model-Based Reinforcement Learning via Score Matching

no code implementations28 Dec 2021 Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known.

Density Estimation Model-based Reinforcement Learning +1

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

no code implementations NeurIPS 2021 Yufeng Zhang, Siyu Chen, Zhuoran Yang, Michael I. Jordan, Zhaoran Wang

Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates.

Representation Learning

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

no code implementations27 Dec 2021 Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan

We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.

reinforcement-learning

ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

no code implementations11 Dec 2021 Xiao-Yang Liu, Zechu Li, Zhuoran Yang, Jiahao Zheng, Zhaoran Wang, Anwar Walid, Jian Guo, Michael I. Jordan

In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels.

reinforcement-learning

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

1 code implementation NeurIPS 2021 Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.

Multi-agent Reinforcement Learning

BooVI: Provably Efficient Bootstrapped Value Iteration

no code implementations NeurIPS 2021 Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Despite the tremendous success of reinforcement learning (RL) with function approximation, efficient exploration remains a significant challenge, both practically and theoretically.

Efficient Exploration reinforcement-learning

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

no code implementations NeurIPS 2021 Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint.

reinforcement-learning

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

no code implementations NeurIPS 2021 Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang

The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism.

reinforcement-learning

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

1 code implementation24 Oct 2021 Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Zhaoran Wang, Jing Jiang

Offline reinforcement learning (RL) aims to learn the optimal policy from a pre-collected dataset without online interactions.

Offline RL reinforcement-learning

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

no code implementations19 Oct 2021 Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase.

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

no code implementations18 Oct 2021 Han Zhong, Zhuoran Yang, Zhaoran Wang Csaba Szepesvári

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Finds Global Optima

no code implementations4 Oct 2021 Boyi Liu, Jiayang Li, Zhuoran Yang, Hoi-To Wai, Mingyi Hong, Yu Marco Nie, Zhaoran Wang

To regulate a social system comprised of self-interested agents, economic incentives (e. g., taxes, tolls, and subsidies) are often required to induce a desirable outcome.

Reinforcement Learning under a Multi-agent Predictive State Representation Model: Method and Theory

no code implementations ICLR 2022 Zhi Zhang, Zhuoran Yang, Han Liu, Pratap Tokekar, Furong Huang

This paper proposes a new algorithm for learning the optimal policies under a novel multi-agent predictive state representation reinforcement learning model.

reinforcement-learning

Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games?

no code implementations29 Sep 2021 Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael Jordan

To our best knowledge, we establish the first provably efficient RL algorithms for solving SNE in general-sum Markov games with leader-controlled state transitions.

reinforcement-learning

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

no code implementations19 Aug 2021 Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set.

Imitation Learning

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

no code implementations8 Aug 2021 Pratik Ramprasad, Yuantong Li, Zhuoran Yang, Zhaoran Wang, Will Wei Sun, Guang Cheng

The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms.

online learning reinforcement-learning

Towards General Function Approximation in Zero-Sum Markov Games

no code implementations ICLR 2022 Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang

In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm.

A Unified Off-Policy Evaluation Approach for General Value Function

no code implementations6 Jul 2021 Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large.

Anomaly Detection

Gap-Dependent Bounds for Two-Player Markov Games

no code implementations1 Jul 2021 Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du

As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention.

Q-Learning reinforcement-learning

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

no code implementations15 Jun 2021 Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.

reinforcement-learning

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

no code implementations23 Feb 2021 Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions.

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

no code implementations19 Feb 2021 Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Mladen Kolar, Zhaoran Wang

Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action.

Offline RL reinforcement-learning

Optimistic Policy Optimization with General Function Approximations

no code implementations1 Jan 2021 Qi Cai, Zhuoran Yang, Csaba Szepesvari, Zhaoran Wang

Although policy optimization with neural networks has a track record of achieving state-of-the-art results in reinforcement learning on various domains, the theoretical understanding of the computational and sample efficiency of policy optimization remains restricted to linear function approximations with finite-dimensional feature representations, which hinders the design of principled, effective, and efficient algorithms.

reinforcement-learning

Offline Policy Optimization with Variance Regularization

no code implementations1 Jan 2021 Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Zhaoran Wang, Animesh Garg, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

Continuous Control Offline RL +1

Policy Optimization in Zero-Sum Markov Games: Fictitious Self-Play Provably Attains Nash Equilibria

no code implementations1 Jan 2021 Boyi Liu, Zhuoran Yang, Zhaoran Wang

Specifically, in each iteration, each player infers the policy of the opponent implicitly via policy evaluation and improves its current policy by taking the smoothed best-response via a proximal policy optimization (PPO) step.

Is Pessimism Provably Efficient for Offline RL?

no code implementations30 Dec 2020 Ying Jin, Zhuoran Yang, Zhaoran Wang

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori.

Offline RL

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

no code implementations28 Dec 2020 Han Zhong, Ethan X. Fang, Zhuoran Yang, Zhaoran Wang

In particular, we focus on a variance-constrained policy optimization problem where the goal is to find a policy that maximizes the expected value of the long-run average reward, subject to a constraint that the long-run variance of the average reward is upper bounded by a threshold.

reinforcement-learning

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

no code implementations21 Dec 2020 Zhuoran Yang, Yufeng Zhang, Yongxin Chen, Zhaoran Wang

Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles.

Variational Inference

Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach

no code implementations NeurIPS 2020 Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Mladen Kolar, Zhaoran Wang

We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation.

online learning

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

no code implementations NeurIPS 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.

reinforcement-learning

Provably Efficient Neural GTD for Off-Policy Learning

no code implementations NeurIPS 2020 Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE).

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

no code implementations NeurIPS 2020 Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.

Q-Learning reinforcement-learning

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

no code implementations9 Nov 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.

reinforcement-learning

Provable Fictitious Play for General Mean-Field Games

no code implementations8 Oct 2020 Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.

reinforcement-learning

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

no code implementations23 Aug 2020 Shuang Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang

Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data.

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

no code implementations16 Aug 2020 Weichen Wang, Jiequn Han, Zhuoran Yang, Zhaoran Wang

Reinforcement learning is a powerful tool to learn the optimal policy of possibly multiple agents by interacting with the environment.

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

no code implementations ICLR 2021 Zuyue Fu, Zhuoran Yang, Zhaoran Wang

To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.

Understanding Implicit Regularization in Over-Parameterized Single Index Model

no code implementations16 Jul 2020 Jianqing Fan, Zhuoran Yang, Mengxin Yu

For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data.

Variable Selection

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

no code implementations10 Jul 2020 Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem.

Bilevel Optimization Hyperparameter Optimization

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

no code implementations2 Jul 2020 Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Zhaoran Wang, Mladen Kolar

We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation.

online learning

Dynamic Regret of Policy Optimization in Non-stationary Environments

no code implementations NeurIPS 2020 Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.

On the Global Optimality of Model-Agnostic Meta-Learning

no code implementations ICML 2020 Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

Model-agnostic meta-learning (MAML) formulates meta-learning as a bilevel optimization problem, where the inner level solves each subtask based on a shared prior, while the outer level searches for the optimal shared prior by optimizing its aggregated performance over all the subtasks.

Bilevel Optimization Meta-Learning

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

no code implementations NeurIPS 2021 Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes.

Autonomous Driving reinforcement-learning

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

no code implementations NeurIPS 2020 Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility.

Q-Learning reinforcement-learning

Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning

no code implementations21 Jun 2020 Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

Multi-agent Reinforcement Learning reinforcement-learning

Neural Certificates for Safe Control Policies

no code implementations15 Jun 2020 Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou

This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching.

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

no code implementations8 Jun 2020 Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve?

Q-Learning

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate

no code implementations8 Mar 2020 Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang

Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.

Imitation Learning reinforcement-learning

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

no code implementations NeurIPS 2020 Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang

In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode.

online learning reinforcement-learning

Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees

no code implementations ICML 2020 Sen Na, Yuwei Luo, Zhuoran Yang, Zhaoran Wang, Mladen Kolar

We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.

Graph Representation Learning

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

no code implementations1 Mar 2020 Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.

Safe Exploration Safe Reinforcement Learning

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

no code implementations17 Feb 2020 Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.

Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

1 code implementation NeurIPS 2020 Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou

This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks.

Provably Efficient Exploration in Policy Optimization

no code implementations ICML 2020 Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL.

Efficient Exploration reinforcement-learning

Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances

no code implementations9 Dec 2019 Kaiqing Zhang, Zhuoran Yang, Tamer Başar

Multi-agent reinforcement learning (MARL) has long been a significant and everlasting research topic in both machine learning and control.

Decision Making Multi-agent Reinforcement Learning +1

Variance Reduced Policy Evaluation with Smooth Function Approximation

no code implementations NeurIPS 2019 Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang

Policy evaluation with smooth and nonlinear function approximation has shown great potential for reinforcement learning.

reinforcement-learning

Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy

no code implementations NeurIPS 2019 Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning.

reinforcement-learning

Neural Temporal-Difference Learning Converges to Global Optima

no code implementations NeurIPS 2019 Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.

Q-Learning reinforcement-learning

Statistical-Computational Tradeoff in Single Index Models

no code implementations NeurIPS 2019 Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

Using the statistical query model to characterize the computational cost of an algorithm, we show that when $\cov(Y, X^\top\beta^*)=0$ and $\cov(Y,(X^\top\beta^*)^2)>0$, no computationally tractable algorithms can achieve the information-theoretic limit of the minimax risk.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

no code implementations24 Nov 2019 Kaiqing Zhang, Zhuoran Yang, Tamer Başar

Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc.

Autonomous Driving Decision Making +2

Convergent Policy Optimization for Safe Reinforcement Learning

1 code implementation NeurIPS 2019 Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang

We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

no code implementations ICLR 2020 Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

We study discrete-time mean-field Markov games with infinite numbers of agents where each agent aims to minimize its ergodic cost.

Sample Elicitation

1 code implementation8 Oct 2019 Jiaheng Wei, Zuyue Fu, Yang Liu, Xingyu Li, Zhuoran Yang, Zhaoran Wang

We also show a connection between this sample elicitation problem and $f$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples.

Credible Sample Elicitation by Deep Learning, for Deep Learning

no code implementations25 Sep 2019 Yang Liu, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

While classical elicitation results apply to eliciting a complex and generative (and continuous) distribution $p(x)$ for this image data, we are interested in eliciting samples $x_i \sim p(x)$ from agents.

Robust One-Bit Recovery via ReLU Generative Networks: Improved Statistical Rate and Global Landscape Analysis

no code implementations NeurIPS Workshop Deep_Invers 2019 Shuang Qiu, Xiaohan Wei, Zhuoran Yang

In this paper, we consider a new framework for the one-bit sensing problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$.

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

no code implementations ICLR 2020 Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate.

Policy Gradient Methods

Robust One-Bit Recovery via ReLU Generative Networks: Near-Optimal Statistical Rate and Global Landscape Analysis

no code implementations ICML 2020 Shuang Qiu, Xiaohan Wei, Zhuoran Yang

Specifically, we consider a new framework for this problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0 \in \mathbb{R}^k$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$ such that $\theta_0 = G(x_0)$.

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization

no code implementations7 Aug 2019 Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network.

Multi-agent Reinforcement Learning Stochastic Optimization

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

no code implementations NeurIPS 2016 Xinyang Yi, Zhaoran Wang, Zhuoran Yang, Constantine Caramanis, Han Liu

We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1- {\alpha}$.

A Convergence Result for Regularized Actor-Critic Methods

no code implementations13 Jul 2019 Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Ji Liu

In this paper, we present a probability one convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic approximation.

Provably Efficient Reinforcement Learning with Linear Function Approximation

1 code implementation11 Jul 2019 Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael. I. Jordan

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy.

reinforcement-learning

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

no code implementations6 Jul 2019 Yixuan Lin, Kaiqing Zhang, Zhuoran Yang, Zhaoran Wang, Tamer Başar, Romeil Sandhu, Ji Liu

This paper considers a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors.

reinforcement-learning

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

no code implementations25 Jun 2019 Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning.

reinforcement-learning

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

no code implementations NeurIPS 2019 Kaiqing Zhang, Zhuoran Yang, Tamer Başar

To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria.

Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima

1 code implementation NeurIPS 2019 Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.

Q-Learning reinforcement-learning

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

1 code implementation15 Mar 2019 Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy.

reinforcement-learning

A Theoretical Analysis of Deep Q-Learning

no code implementations1 Jan 2019 Jianqing Fan, Zhaoran Wang, Yuchen Xie, Zhuoran Yang

Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood.

Q-Learning

Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents

no code implementations6 Dec 2018 Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar

This work appears to be the first finite-sample analysis for batch MARL, a step towards rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.

Multi-agent Reinforcement Learning reinforcement-learning

Contrastive Learning from Pairwise Measurements

no code implementations NeurIPS 2018 Yi Chen, Zhuoran Yang, Yuchen Xie, Princeton Zhaoran Wang

In this paper, we study a semiparametric model where the pairwise measurements follow a natural exponential family distribution with an unknown base measure.

Contrastive Learning Data Augmentation

Provable Gaussian Embedding with One Observation

no code implementations NeurIPS 2018 Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang

In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.

High-dimensional Varying Index Coefficient Models via Stein's Identity

1 code implementation16 Oct 2018 Sen Na, Zhuoran Yang, Zhaoran Wang, Mladen Kolar

We study the parameter estimation problem for a varying index coefficient model in high dimensions.

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

3 code implementations10 Oct 2018 Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu

Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely.

reinforcement-learning

Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval

no code implementations21 Aug 2018 Jianqing Fan, Han Liu, Zhaoran Wang, Zhuoran Yang

We study the fundamental tradeoffs between statistical accuracy and computational tractability in the analysis of high dimensional heterogeneous data.

Tensor Methods for Additive Index Models under Discordance and Heterogeneity

no code implementations17 Jul 2018 Krishnakumar Balasubramanian, Jianqing Fan, Zhuoran Yang

Motivated by the sampling problems and heterogeneity issues common in high- dimensional big datasets, we consider a class of discordant additive index models.

The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference

no code implementations ICML 2018 Hao Lu, Yuan Cao, Zhuoran Yang, Junwei Lu, Han Liu, Zhaoran Wang

We study the hypothesis testing problem of inferring the existence of combinatorial structures in undirected graphical models.

Two-sample testing

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

no code implementations NeurIPS 2018 Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents.

Multi-agent Reinforcement Learning reinforcement-learning

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

4 code implementations ICML 2018 Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar

To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large.

Multi-agent Reinforcement Learning reinforcement-learning

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

no code implementations18 Dec 2017 Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.

On Stein's Identity and Near-Optimal Estimation in High-dimensional Index Models

no code implementations26 Sep 2017 Zhuoran Yang, Krishnakumar Balasubramanian, Han Liu

We consider estimating the parametric components of semi-parametric multiple index models in a high-dimensional and non-Gaussian setting.

Human Memory Search as Initial-Visit Emitting Random Walk

no code implementations NeurIPS 2015 Kwang-Sung Jun, Jerry Zhu, Timothy T. Rogers, Zhuoran Yang, Ming Yuan

In this paper, we propose the first efficient maximum likelihood estimate (MLE) for INVITE by decomposing the censored output into a series of absorbing random walks.

Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference

no code implementations14 Nov 2015 Zhuoran Yang, Zhaoran Wang, Han Liu, Yonina C. Eldar, Tong Zhang

To recover $\beta^*$, we propose an $\ell_1$-regularized least-squares estimator.

On Semiparametric Exponential Family Graphical Models

no code implementations30 Dec 2014 Zhuoran Yang, Yang Ning, Han Liu

We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data.

Two-sample testing

Cannot find the paper you are looking for? You can Submit a new open access paper.