Search Results for author: Zhaoran Wang

Found 177 papers, 26 papers with code

FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning

4 code implementations • 6 Nov 2022 • Xiao-Yang Liu, Ziyi Xia, Jingyang Rui, Jiechao Gao, Hongyang Yang, Ming Zhu, Christina Dan Wang, Zhaoran Wang, Jian Guo

However, establishing high-quality market environments and benchmarks for financial reinforcement learning is challenging due to three major factors, namely, low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting in the backtesting stage.

reinforcement-learning Reinforcement Learning (RL)

9,006

Paper
Code

Dynamic Datasets and Market Environments for Financial Reinforcement Learning

4 code implementations • 25 Apr 2023 • Xiao-Yang Liu, Ziyi Xia, Hongyang Yang, Jiechao Gao, Daochen Zha, Ming Zhu, Christina Dan Wang, Zhaoran Wang, Jian Guo

The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets.

reinforcement-learning

9,006

Paper
Code

ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

1 code implementation • 11 Dec 2021 • Xiao-Yang Liu, Zechu Li, Zhuoran Yang, Jiahao Zheng, Zhaoran Wang, Anwar Walid, Jian Guo, Michael I. Jordan

In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels.

reinforcement-learning Reinforcement Learning (RL) +1

3,424

Paper
Code

FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance

1 code implementation • 13 Dec 2021 • Xiao-Yang Liu, Jingyang Rui, Jiechao Gao, Liuqing Yang, Hongyang Yang, Zhaoran Wang, Christina Dan Wang, Jian Guo

In this paper, we present a FinRL-Meta framework that builds a universe of market environments for data-driven financial reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

1,111

Paper
Code

Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

1 code implementation • NeurIPS 2020 • Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou

This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks.

144

Paper
Code

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

1 code implementation • 29 Sep 2023 • Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang

Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future").

119

Paper
Code

Provably Efficient Reinforcement Learning with Linear Function Approximation

2 code implementations • 11 Jul 2019 • Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael. I. Jordan

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

1 code implementation • 15 Jun 2021 • Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

3 code implementations • 28 Mar 2023 • Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan

This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.

D4RL Offline RL +2

Paper
Code

Sample Elicitation

1 code implementation • 8 Oct 2019 • Jiaheng Wei, Zuyue Fu, Yang Liu, Xingyu Li, Zhuoran Yang, Zhaoran Wang

We also show a connection between this sample elicitation problem and $f$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples.

Paper
Code

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

1 code implementation • ICLR 2022 • Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang

We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL.

D4RL Offline RL +3

Paper
Code

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

1 code implementation • 6 Jun 2022 • Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks.

Decision Making Offline RL +2

Paper
Code

False Correlation Reduction for Offline Reinforcement Learning

1 code implementation • 24 Oct 2021 • Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang

Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems.

D4RL Decision Making +3

Paper
Code

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

1 code implementation • 29 Jul 2022 • Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang

Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.

Contrastive Learning reinforcement-learning +3

Paper
Code

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

1 code implementation • NeurIPS 2023 • Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.

Paper
Code

Convergent Policy Optimization for Safe Reinforcement Learning

1 code implementation • NeurIPS 2019 • Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang

We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Code

Principled Exploration via Optimistic Bootstrapping and Backward Induction

1 code implementation • 13 May 2021 • Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).

Efficient Exploration Reinforcement Learning (RL)

Paper
Code

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

1 code implementation • 8 May 2023 • Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee

Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.

LEMMA Multi-agent Reinforcement Learning +1

Paper
Code

Dynamic Bottleneck for Robust Self-Supervised Exploration

1 code implementation • NeurIPS 2021 • Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards.

Paper
Code

Differentiable Bilevel Programming for Stackelberg Congestion Games

1 code implementation • 15 Sep 2022 • Jiayang Li, Jing Yu, Qianni Wang, Boyi Liu, Zhaoran Wang, Yu Marco Nie

A Stackelberg congestion game (SCG) is a bilevel program in which a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which followers settle by playing a congestion game.

Paper
Code

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

1 code implementation • NeurIPS 2021 • Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.

Multi-agent Reinforcement Learning

Paper
Code

High-dimensional Varying Index Coefficient Models via Stein's Identity

1 code implementation • 16 Oct 2018 • Sen Na, Zhuoran Yang, Zhaoran Wang, Mladen Kolar

We study the parameter estimation problem for a varying index coefficient model in high dimensions.

Vocal Bursts Intensity Prediction

Paper
Code

Exponential Family Model-Based Reinforcement Learning via Score Matching

1 code implementation • 28 Dec 2021 • Gene Li, Junbo Li, Anmol Kabra, Nathan Srebro, Zhaoran Wang, Zhuoran Yang

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known.

Density Estimation Model-based Reinforcement Learning +3

Paper
Code

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

1 code implementation • 15 Mar 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima

1 code implementation • NeurIPS 2019 • Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.

Q-Learning

Paper
Code

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations • 13 Jun 2018 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks.

Generalization Bounds

Paper
Add Code

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

no code implementations • NeurIPS 2018 • Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao

We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).

Additive models Model Selection +2

Paper
Add Code

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

no code implementations • 29 Dec 2016 • Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao

We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e. g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}.

Retrieval

Paper
Add Code

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

no code implementations • 18 Dec 2017 • Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.

Retrieval

Paper
Add Code

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

no code implementations • NeurIPS 2016 • Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang

We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a nonsmooth regularizer.

Stochastic Optimization

Paper
Add Code

Tensor Graphical Model: Non-convex Optimization and Statistical Inference

no code implementations • 15 Sep 2016 • Xiang Lyu, Will Wei Sun, Zhaoran Wang, Han Liu, Jian Yang, Guang Cheng

We consider the estimation and inference of graphical models that characterize the dependency structure of high-dimensional tensor-valued data.

Paper
Add Code

Sparse Generalized Eigenvalue Problem: Optimal Statistical Rates via Truncated Rayleigh Flow

no code implementations • 29 Apr 2016 • Kean Ming Tan, Zhaoran Wang, Han Liu, Tong Zhang

Sparse generalized eigenvalue problem (GEP) plays a pivotal role in a large family of high-dimensional statistical models, including sparse Fisher's discriminant analysis, canonical correlation analysis, and sufficient dimension reduction.

Dimensionality Reduction

Paper
Add Code

Sharp Computational-Statistical Phase Transitions via Oracle Computational Model

no code implementations • 30 Dec 2015 • Zhaoran Wang, Quanquan Gu, Han Liu

Based upon an oracle model of computation, which captures the interactions between algorithms and data, we establish a general lower bound that explicitly connects the minimum testing risk under computational budget constraints with the intrinsic probabilistic and combinatorial structures of statistical problems.

Two-sample testing

Paper
Add Code

Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference

no code implementations • 14 Nov 2015 • Zhuoran Yang, Zhaoran Wang, Han Liu, Yonina C. Eldar, Tong Zhang

To recover $\beta^*$, we propose an $\ell_1$-regularized least-squares estimator.

regression valid

Paper
Add Code

Optimal linear estimation under unknown nonlinear transform

no code implementations • NeurIPS 2015 • Xinyang Yi, Zhaoran Wang, Constantine Caramanis, Han Liu

This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing.

Paper
Add Code

Statistical Limits of Convex Relaxations

no code implementations • 4 Mar 2015 • Zhaoran Wang, Quanquan Gu, Han Liu

Many high dimensional sparse learning problems are formulated as nonconvex optimization.

Sparse Learning Stochastic Block Model

Paper
Add Code

High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality

no code implementations • 30 Dec 2014 • Zhaoran Wang, Quanquan Gu, Yang Ning, Han Liu

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models.

Vocal Bursts Intensity Prediction

Paper
Add Code

Optimal computational and statistical rates of convergence for sparse nonconvex learning problems

no code implementations • 20 Jun 2013 • Zhaoran Wang, Han Liu, Tong Zhang

In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator.

regression

Paper
Add Code

Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time

no code implementations • 22 Aug 2014 • Zhaoran Wang, Huanran Lu, Han Liu

To optimally estimate sparse principal subspaces, we propose a two-stage computational framework named "tighten after relax": Within the 'relax' stage, we approximately solve a convex relaxation of sparse PCA with early stopping to obtain a desired initial estimator; For the 'tighten' stage, we propose a novel algorithm called sparse orthogonal iteration pursuit (SOAP), which iteratively refines the initial estimator by directly solving the underlying nonconvex problem.

Paper
Add Code

Sparse Principal Component Analysis for High Dimensional Vector Autoregressive Models

no code implementations • 30 Jun 2013 • Zhaoran Wang, Fang Han, Han Liu

We study sparse principal component analysis for high dimensional vector autoregressive time series under a doubly asymptotic framework, which allows the dimension $d$ to scale with the series length $T$.

Time Series Time Series Analysis +1

Paper
Add Code

Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy

no code implementations • ICLR 2019 • Yuan Xie, Boyi Liu, Qiang Liu, Zhaoran Wang, Yuan Zhou, Jian Peng

Such an error reduction phenomenon is somewhat surprising as the estimated surrogate policy is less accurate than the given historical policy.

Multi-Label Classification Off-policy evaluation +1

Paper
Add Code

Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval

no code implementations • 21 Aug 2018 • Jianqing Fan, Han Liu, Zhaoran Wang, Zhuoran Yang

We study the fundamental tradeoffs between statistical accuracy and computational tractability in the analysis of high dimensional heterogeneous data.

Clustering Retrieval

Paper
Add Code

Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes

no code implementations • NeurIPS 2016 • Chris Junchi Li, Zhaoran Wang, Han Liu

Despite the empirical success of nonconvex statistical optimization methods, their global dynamics, especially convergence to the desirable local minima, remain less well understood in theory.

Tensor Decomposition

Paper
Add Code

A convex formulation for high-dimensional sparse sliced inverse regression

no code implementations • 17 Sep 2018 • Kean Ming Tan, Zhaoran Wang, Tong Zhang, Han Liu, R. Dennis Cook

Sliced inverse regression is a popular tool for sufficient dimension reduction, which replaces covariates with a minimal set of their linear combinations without loss of information on the conditional distribution of the response given the covariates.

Dimensionality Reduction regression +2

Paper
Add Code

Provable Gaussian Embedding with One Observation

no code implementations • NeurIPS 2018 • Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang

In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.

BIG-bench Machine Learning

Paper
Add Code

Blind Attacks on Machine Learners

no code implementations • NeurIPS 2016 • Alex Beatson, Zhaoran Wang, Han Liu

We study the potential of a “blind attacker” to provably limit a learner’s performance by data injection attack without observing the learner’s training set or any parameter of the distribution from which it is drawn.

Paper
Add Code

Agnostic Estimation for Misspecified Phase Retrieval Models

no code implementations • NeurIPS 2016 • Matey Neykov, Zhaoran Wang, Han Liu

The goal of noisy high-dimensional phase retrieval is to estimate an $s$-sparse parameter $\boldsymbol{\beta}^*\in \mathbb{R}^d$ from $n$ realizations of the model $Y = (\boldsymbol{X}^{\top} \boldsymbol{\beta}^*)^2 + \varepsilon$.

Retrieval

Paper
Add Code

A Nonconvex Optimization Framework for Low Rank Matrix Estimation

no code implementations • NeurIPS 2015 • Tuo Zhao, Zhaoran Wang, Han Liu

We study the estimation of low rank matrices via nonconvex optimization.

Paper
Add Code

Non-convex Statistical Optimization for Sparse Tensor Graphical Model

no code implementations • NeurIPS 2015 • Wei Sun, Zhaoran Wang, Han Liu, Guang Cheng

We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data.

Paper
Add Code

High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality

no code implementations • NeurIPS 2015 • Zhaoran Wang, Quanquan Gu, Yang Ning, Han Liu

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models.

Vocal Bursts Intensity Prediction

Paper
Add Code

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

no code implementations • NeurIPS 2014 • Zhaoran Wang, Huanran Lu, Han Liu

In this paper, we propose a two-stage sparse PCA procedure that attains the optimal principal subspace estimator in polynomial time.

Paper
Add Code

The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference

no code implementations • ICML 2018 • Hao Lu, Yuan Cao, Zhuoran Yang, Junwei Lu, Han Liu, Zhaoran Wang

We study the hypothesis testing problem of inferring the existence of combinatorial structures in undirected graphical models.

Two-sample testing

Paper
Add Code

On Tighter Generalization Bounds for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations • ICLR 2019 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We propose a generalization error bound for a general family of deep neural networks based on the depth and width of the networks, as well as the spectral norm of weight matrices.

Generalization Bounds

Paper
Add Code

A Theoretical Analysis of Deep Q-Learning

no code implementations • 1 Jan 2019 • Jianqing Fan, Zhaoran Wang, Yuchen Xie, Zhuoran Yang

Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood.

Q-Learning

Paper
Add Code

On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator

no code implementations • 11 Jan 2019 • Qi Cai, Mingyi Hong, Yongxin Chen, Zhaoran Wang

We study the global convergence of generative adversarial imitation learning for linear quadratic regulators, which is posed as minimax optimization.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

no code implementations • 25 Jun 2019 • Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

no code implementations • 6 Jul 2019 • Yixuan Lin, Kaiqing Zhang, Zhuoran Yang, Zhaoran Wang, Tamer Başar, Romeil Sandhu, Ji Liu

This paper considers a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

no code implementations • NeurIPS 2016 • Xinyang Yi, Zhaoran Wang, Zhuoran Yang, Constantine Caramanis, Han Liu

We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1- {\alpha}$.

Binary Classification Computational Efficiency +1

Paper
Add Code

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

no code implementations • 14 Jul 2019 • Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind.

Bilevel Optimization

Paper
Add Code

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization

no code implementations • 7 Aug 2019 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network.

Multi-agent Reinforcement Learning Stochastic Optimization

Paper
Add Code

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

no code implementations • ICLR 2020 • Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate.

Policy Gradient Methods

Paper
Add Code

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

no code implementations • ICLR 2020 • Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

We study discrete-time mean-field Markov games with infinite numbers of agents where each agent aims to minimize its ergodic cost.

Paper
Add Code

Variance Reduced Policy Evaluation with Smooth Function Approximation

no code implementations • NeurIPS 2019 • Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang

Policy evaluation with smooth and nonlinear function approximation has shown great potential for reinforcement learning.

Paper
Add Code

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

no code implementations • NeurIPS 2019 • Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind.

Bilevel Optimization

Paper
Add Code

Statistical-Computational Tradeoff in Single Index Models

no code implementations • NeurIPS 2019 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

Using the statistical query model to characterize the computational cost of an algorithm, we show that when $\cov(Y, X^\top\beta^*)=0$ and $\cov(Y,(X^\top\beta^*)^2)>0$, no computationally tractable algorithms can achieve the information-theoretic limit of the minimax risk.

Paper
Add Code

Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy

no code implementations • NeurIPS 2019 • Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Reinforcement Learning (RL)

Paper
Add Code

Neural Temporal-Difference Learning Converges to Global Optima

no code implementations • NeurIPS 2019 • Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Exploration in Policy Optimization

no code implementations • ICML 2020 • Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL.

Efficient Exploration Reinforcement Learning (RL)

Paper
Add Code

Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator

no code implementations • 14 Dec 2019 • Yuwei Luo, Zhuoran Yang, Zhaoran Wang, Mladen Kolar

Multi-agent reinforcement learning has been successfully applied to a number of challenging problems.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

On Computation and Generalization of Generative Adversarial Imitation Learning

no code implementations • ICLR 2020 • Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao

Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

no code implementations • 17 Feb 2020 • Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.

Paper
Add Code

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

no code implementations • 1 Mar 2020 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.

Safe Exploration Safe Reinforcement Learning

Paper
Add Code

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

no code implementations • NeurIPS 2020 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang

In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees

no code implementations • ICML 2020 • Sen Na, Yuwei Luo, Zhuoran Yang, Zhaoran Wang, Mladen Kolar

We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.

Graph Representation Learning

Paper
Add Code

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate

no code implementations • 8 Mar 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang

Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Deep Reinforcement Learning with Robust and Smooth Policy

no code implementations • 21 Mar 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

Deep reinforcement learning (RL) has achieved great empirical successes in various domains.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

no code implementations • 8 Jun 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve?

Q-Learning

Paper
Add Code

Neural Certificates for Safe Control Policies

no code implementations • 15 Jun 2020 • Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou

This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching.

Paper
Add Code

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

no code implementations • NeurIPS 2021 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes.

Autonomous Driving reinforcement-learning +1

Paper
Add Code

Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning

no code implementations • 21 Jun 2020 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

On the Global Optimality of Model-Agnostic Meta-Learning

no code implementations • ICML 2020 • Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

Model-agnostic meta-learning (MAML) formulates meta-learning as a bilevel optimization problem, where the inner level solves each subtask based on a shared prior, while the outer level searches for the optimal shared prior by optimizing its aggregated performance over all the subtasks.

Bilevel Optimization Meta-Learning

Paper
Add Code

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility.

Q-Learning reinforcement-learning +1

Paper
Add Code

Dynamic Regret of Policy Optimization in Non-stationary Environments

no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.

Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

no code implementations • 2 Jul 2020 • Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Zhaoran Wang, Mladen Kolar

We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation.

Paper
Add Code

Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion

no code implementations • ICLR 2019 • Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang

To attain the advantages of both regimes, we propose to use replica exchange, which swaps between two Langevin diffusions with different temperatures.

Paper
Add Code

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

no code implementations • 10 Jul 2020 • Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem.

Bilevel Optimization Hyperparameter Optimization

Paper
Add Code

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

no code implementations • ICLR 2021 • Zuyue Fu, Zhuoran Yang, Zhaoran Wang

To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.

Paper
Add Code

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

no code implementations • 16 Aug 2020 • Weichen Wang, Jiequn Han, Zhuoran Yang, Zhaoran Wang

Reinforcement learning is a powerful tool to learn the optimal policy of possibly multiple agents by interacting with the environment.

Paper
Add Code

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

no code implementations • 23 Aug 2020 • Shuang Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang

Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data.

Paper
Add Code

Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate

no code implementations • ICML 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang

Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Deep Reinforcement Learning with Smooth Policy

no code implementations • ICML 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

In contrast to policy parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Breaking the Curse of Many Agents: Provable Mean Embedding $Q$-Iteration for Mean-Field Reinforcement Learning

no code implementations • ICML 2020 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

We highlight that MF-FQI algorithm enjoys a ``blessing of many agents'' property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Computational and Statistical Tradeoffs in Inferring Combinatorial Structures of Ising Model

no code implementations • ICML 2020 • Ying Jin, Zhaoran Wang, Junwei Lu

We study the computational and statistical tradeoffs in inferring combinatorial structures of high dimensional simple zero-field ferromagnetic Ising model.

valid

Paper
Add Code

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection

no code implementations • 4 Sep 2020 • Yining Wang, Yi Chen, Ethan X. Fang, Zhaoran Wang, Runze Li

We consider the stochastic contextual bandit problem under the high dimensional linear model.

Paper
Add Code

Policy Optimization in Zero-Sum Markov Games: Fictitious Self-Play Provably Attains Nash Equilibria

no code implementations • 1 Jan 2021 • Boyi Liu, Zhuoran Yang, Zhaoran Wang

Specifically, in each iteration, each player infers the policy of the opponent implicitly via policy evaluation and improves its current policy by taking the smoothed best-response via a proximal policy optimization (PPO) step.

Paper
Add Code

Offline Policy Optimization with Variance Regularization

no code implementations • 1 Jan 2021 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Zhaoran Wang, Animesh Garg, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

Continuous Control Offline RL +1

Paper
Add Code

Optimistic Policy Optimization with General Function Approximations

no code implementations • 1 Jan 2021 • Qi Cai, Zhuoran Yang, Csaba Szepesvari, Zhaoran Wang

Although policy optimization with neural networks has a track record of achieving state-of-the-art results in reinforcement learning on various domains, the theoretical understanding of the computational and sample efficiency of policy optimization remains restricted to linear function approximations with finite-dimensional feature representations, which hinders the design of principled, effective, and efficient algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Entropic Risk-Sensitive Reinforcement Learning: A Meta Regret Framework with Function Approximation

no code implementations • 1 Jan 2021 • Yingjie Fei, Zhuoran Yang, Zhaoran Wang

We study risk-sensitive reinforcement learning with the entropic risk measure and function approximation.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning

no code implementations • 1 Jan 2021 • Chenjia Bai, Lingxiao Wang, Peng Liu, Zhaoran Wang, Jianye Hao, Yingnan Zhao

However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL).

Atari Games Efficient Exploration +3

Paper
Add Code

Provable Fictitious Play for General Mean-Field Games

no code implementations • 8 Oct 2020 • Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

End-to-End Learning and Intervention in Games

no code implementations • NeurIPS 2020 • Jiayang Li, Jing Yu, Yu, Nie, Zhaoran Wang

In this paper, we provide a unified framework for learning and intervention in games.

Paper
Add Code

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

no code implementations • 9 Nov 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.

Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach

no code implementations • NeurIPS 2020 • Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Mladen Kolar, Zhaoran Wang

We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation.

Paper
Add Code

Provably Efficient Neural GTD for Off-Policy Learning

no code implementations • NeurIPS 2020 • Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE).

Paper
Add Code

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

no code implementations • NeurIPS 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Can Temporal-Diﬀerence and Q-Learning Learn Representation? A Mean-Field Theory

no code implementations • NeurIPS 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

Temporal-diﬀerence and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.

Q-Learning reinforcement-learning +1

Paper
Add Code

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

no code implementations • 21 Dec 2020 • Zhuoran Yang, Yufeng Zhang, Yongxin Chen, Zhaoran Wang

Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles.

Generative Adversarial Network Variational Inference

Paper
Add Code

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

no code implementations • 28 Dec 2020 • Han Zhong, Xun Deng, Ethan X. Fang, Zhuoran Yang, Zhaoran Wang, Runze Li

In particular, we focus on a variance-constrained policy optimization problem where the goal is to find a policy that maximizes the expected value of the long-run average reward, subject to a constraint that the long-run variance of the average reward is upper bounded by a threshold.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Is Pessimism Provably Efficient for Offline RL?

no code implementations • 30 Dec 2020 • Ying Jin, Zhuoran Yang, Zhaoran Wang

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Provably Training Overparameterized Neural Network Classifiers with Non-convex Constraints

no code implementations • 30 Dec 2020 • You-Lin Chen, Zhaoran Wang, Mladen Kolar

Training a classifier under non-convex constraints has gotten increasing attention in the machine learning community thanks to its wide range of applications such as algorithmic fairness and class-imbalanced classification.

Fairness imbalanced classification

Paper
Add Code

A Primal-Dual Approach to Constrained Markov Decision Processes

no code implementations • 26 Jan 2021 • Yi Chen, Jing Dong, Zhaoran Wang

In many operations management problems, we need to make decisions sequentially to minimize the cost while satisfying certain constraints.

Optimization and Control

Paper
Add Code

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

no code implementations • NeurIPS 2021 • Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth.

Bilevel Optimization Hyperparameter Optimization

Paper
Add Code

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

no code implementations • 19 Feb 2021 • Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Mladen Kolar, Zhaoran Wang

Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action.

Offline RL reinforcement-learning +2

Paper
Add Code

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

no code implementations • 23 Feb 2021 • Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions.

Paper
Add Code

Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach

no code implementations • 18 May 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.

Inductive Bias Multi-agent Reinforcement Learning

Paper
Add Code

Verification in the Loop: Correct-by-Construction Control Learning with Reach-avoid Guarantees

no code implementations • 6 Jun 2021 • YiXuan Wang, Chao Huang, Zhaoran Wang, Zhilu Wang, Qi Zhu

Specifically, we leverage the verification results (computed reachable set of the system state) to construct feedback metrics for control learning, which measure how likely the current design of control parameters can meet the required reach-avoid property for safety and goal-reaching.

Paper
Add Code

Gap-Dependent Bounds for Two-Player Markov Games

no code implementations • 1 Jul 2021 • Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du

As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention.

Q-Learning Vocal Bursts Valence Prediction

Paper
Add Code

A Unified Off-Policy Evaluation Approach for General Value Function

no code implementations • 6 Jul 2021 • Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large.

Anomaly Detection Off-policy evaluation

Paper
Add Code

Towards General Function Approximation in Zero-Sum Markov Games

no code implementations • ICLR 2022 • Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang

In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm.

Paper
Add Code

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

no code implementations • 8 Aug 2021 • Pratik Ramprasad, Yuantong Li, Zhuoran Yang, Zhaoran Wang, Will Wei Sun, Guang Cheng

The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

no code implementations • 19 Aug 2021 • Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set.

Imitation Learning

Paper
Add Code

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence

no code implementations • 4 Oct 2021 • Boyi Liu, Jiayang Li, Zhuoran Yang, Hoi-To Wai, Mingyi Hong, Yu Marco Nie, Zhaoran Wang

To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome.

Bilevel Optimization

Paper
Add Code

A Principled Permutation Invariant Approach to Mean-Field Multi-Agent Reinforcement Learning

no code implementations • 29 Sep 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

Inductive Bias Multi-agent Reinforcement Learning +2

Paper
Add Code

Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games?

no code implementations • 29 Sep 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael Jordan

To our best knowledge, we establish the first provably efficient RL algorithms for solving SNE in general-sum Markov games with leader-controlled state transitions.

Reinforcement Learning (RL)

Paper
Add Code

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

no code implementations • 18 Oct 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Csaba Szepesvári

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).

Reinforcement Learning (RL)

Paper
Add Code

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

no code implementations • 19 Oct 2021 • Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase.

Reinforcement Learning (RL)

Paper
Add Code

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

no code implementations • NeurIPS 2021 • Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang

The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

FinRL-Podracer: High Performance and Scalable Deep Reinforcement Learning for Quantitative Finance

no code implementations • 7 Nov 2021 • Zechu Li, Xiao-Yang Liu, Jiahao Zheng, Zhaoran Wang, Anwar Walid, Jian Guo

Unfortunately, the steep learning curve and the difficulty in quick modeling and agile development are impeding finance researchers from using deep reinforcement learning in quantitative trading.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

BooVI: Provably Efficient Bootstrapped Value Iteration

no code implementations • NeurIPS 2021 • Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Despite the tremendous success of reinforcement learning (RL) with function approximation, efficient exploration remains a significant challenge, both practically and theoretically.

Efficient Exploration Reinforcement Learning (RL)

Paper
Add Code

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

no code implementations • NeurIPS 2021 • Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Add Code

Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization Perspective

no code implementations • 27 Sep 2018 • Zhuoran Yang, Zuyue Fu, Kaiqing Zhang, Zhaoran Wang

We study reinforcement learning algorithms with nonlinear function approximation in the online setting.

Bilevel Optimization Q-Learning +2

Paper
Add Code

Credible Sample Elicitation by Deep Learning, for Deep Learning

no code implementations • 25 Sep 2019 • Yang Liu, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

While classical elicitation results apply to eliciting a complex and generative (and continuous) distribution $p(x)$ for this image data, we are interested in eliciting samples $x_i \sim p(x)$ from agents.

Paper
Add Code

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

no code implementations • 27 Dec 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan

We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.

Reinforcement Learning (RL)

Paper
Add Code

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

no code implementations • NeurIPS 2021 • Yufeng Zhang, Siyu Chen, Zhuoran Yang, Michael I. Jordan, Zhaoran Wang

Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates.

Representation Learning

Paper
Add Code

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

no code implementations • 28 Jan 2022 • YiXuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e. g., safety, stability) under the learned controller.

Bilevel Optimization Model-based Reinforcement Learning +2

Paper
Add Code

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

no code implementations • 15 Feb 2022 • Han Zhong, Wei Xiong, Jiyuan Tan, LiWei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang

When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving.

Paper
Add Code

Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

no code implementations • 22 Feb 2022 • Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan, Haifeng Xu

This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

no code implementations • 25 Feb 2022 • Shuang Qiu, Boxiang Lyu, Qinglin Meng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan

Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

no code implementations • 7 Mar 2022 • Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang

We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

no code implementations • 20 Apr 2022 • Qi Cai, Zhuoran Yang, Zhaoran Wang

The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which features a smoothed discriminator tailored to the linear structure, and (iii) the exploration of the observation and state spaces via optimism, which is based on quantifying the uncertainty in the adversarial integral equation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

no code implementations • 5 May 2022 • Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang

In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms.

Offline RL reinforcement-learning +1

Paper
Add Code

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

no code implementations • 23 May 2022 • Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, LiWei Wang

To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.

Reinforcement Learning (RL)

Paper
Add Code

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

no code implementations • 26 May 2022 • Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

For a class of POMDPs with a low-rank structure in the transition kernel, ETC attains an $O(1/\epsilon^2)$ sample complexity that scales polynomially with the horizon and the intrinsic dimension (that is, the rank).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

no code implementations • 26 May 2022 • Miao Lu, Yifei Min, Zhaoran Wang, Zhuoran Yang

We study offline reinforcement learning (RL) in partially observable Markov decision processes.

Causal Inference Offline RL +1

Paper
Add Code

Federated Offline Reinforcement Learning

no code implementations • 11 Jun 2022 • Doudou Zhou, Yufeng Zhang, Aaron Sonabend-W, Zhaoran Wang, Junwei Lu, Tianxi Cai

Extensive simulations demonstrate the effectiveness of the proposed algorithm.

Offline RL Privacy Preserving +2

Paper
Add Code

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

no code implementations • 25 Jul 2022 • Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment.

Paper
Add Code

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

no code implementations • 18 Sep 2022 • Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok

Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment.

Offline RL reinforcement-learning +1

Paper
Add Code

Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

no code implementations • 20 Sep 2022 • Fengzhuo Zhang, Boyi Liu, Kaixin Wang, Vincent Y. F. Tan, Zhuoran Yang, Zhaoran Wang

The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.

Relational Reasoning

Paper
Add Code

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

no code implementations • 29 Sep 2022 • YiXuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu

It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Add Code

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

no code implementations • 19 Oct 2022 • Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan

First, from the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders who aim to manipulates seller's policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

no code implementations • 3 Nov 2022 • Han Zhong, Wei Xiong, Sirui Zheng, LiWei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Latent Variable Representation for Reinforcement Learning

no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality

no code implementations • 19 Dec 2022 • Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang

Existing policy learning methods rely on a uniform overlap assumption, i. e., the propensities of exploring all actions for all individual characteristics are lower bounded in the offline dataset.

Paper
Add Code

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

no code implementations • 23 Dec 2022 • Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang

To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob.

Decision Making Off-policy evaluation +1

Paper
Add Code

Offline Policy Optimization in RL with Variance Regularizaton

no code implementations • 29 Dec 2022 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

Continuous Control Offline RL +1

Paper
Add Code

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

no code implementations • 30 Dec 2022 • Yufeng Zhang, Boyi Liu, Qi Cai, Lingxiao Wang, Zhaoran Wang

In particular, such a representation instantiates the posterior distribution of the latent variable given input tokens, which plays a central role in predicting output labels and solving downstream tasks.

Paper
Add Code

Differentiable Arbitrating in Zero-sum Markov Games

no code implementations • 20 Feb 2023 • Jing Wang, Meichen Song, Feng Gao, Boyi Liu, Zhaoran Wang, Yi Wu

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning

no code implementations • 24 Feb 2023 • Ruitu Xu, Yifei Min, Tianhao Wang, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang

We study a heterogeneous agent macroeconomic model with an infinite number of households and firms competing in a labor market.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations

no code implementations • 20 Mar 2023 • Siyu Chen, Yitan Wang, Zhaoran Wang, Zhuoran Yang

We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.

Paper
Add Code

Wardrop Equilibrium Can Be Boundedly Rational: A New Behavioral Theory of Route Choice

no code implementations • 5 Apr 2023 • Jiayang Li, Zhaoran Wang, Yu Marco Nie

We achieve this result by developing a day-to-day (DTD) dynamical model that mimics how travelers gradually adjust their route valuations, hence choice probabilities, based on past experiences.

Decision Making

Paper
Add Code

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

no code implementations • 30 May 2023 • Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, Zhaoran Wang

(b) What is a proper performance metric for ICL and what is the error rate?

In-Context Learning

Paper
Add Code

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

no code implementations • 31 May 2023 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović

We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

A General Framework for Sequential Decision-Making under Adaptivity Constraints

no code implementations • 26 Jun 2023 • Nuoya Xiong, Zhaoran Wang, Zhuoran Yang

We take the first step in studying general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning.

Decision Making

Paper
Add Code

Contextual Dynamic Pricing with Strategic Buyers

no code implementations • 8 Jul 2023 • Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun

We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy.

Paper
Add Code

Let Models Speak Ciphers: Multiagent Debate through Embeddings

no code implementations • 10 Oct 2023 • Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang

Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary.

Paper
Add Code

Sample-Efficient Multi-Agent RL: An Optimization Perspective

no code implementations • 10 Oct 2023 • Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang

We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.

Multi-agent Reinforcement Learning

Paper
Add Code

Learning Regularized Graphon Mean-Field Games with Unknown Graphons

no code implementations • 26 Oct 2023 • Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang

Second, using kernel embedding of distributions, we design efficient algorithms to estimate the transition kernels, reward functions, and graphons from sampled agents.

Paper
Add Code

A Principled Framework for Knowledge-enhanced Large Language Model

no code implementations • 18 Nov 2023 • Saizhuo Wang, Zhihan Liu, Zhaoran Wang, Jian Guo

Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning due to issues like hallucinations, limiting their applicability in critical scenarios.

Language Modelling Large Language Model

Paper
Add Code

Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

no code implementations • 22 Nov 2023 • Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye

For these settings, we design a provably sample-efficient algorithm which achieves a $ \mathcal{\tilde O}(s_0^2 \log^2 T)$ regret in the sparse case and $ \mathcal{\tilde O} ( r ^2 \log^2 T)$ regret in the low-rank case, using only $L = \mathcal{O}( \log T)$ batches.

Multi-Armed Bandits

Paper
Add Code

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

no code implementations • 28 Nov 2023 • YiXuan Wang, Ruochen Jiao, Sinong Simon Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data.

Autonomous Driving Common Sense Reasoning +1

Paper
Add Code

Sparse PCA with Oracle Property

no code implementations • NeurIPS 2014 • Quanquan Gu, Zhaoran Wang, Han Liu

In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank-$k$, and attains a $\sqrt{s/n}$ statistical rate of convergence with $s$ being the subspace sparsity level and $n$ the sample size.

Paper
Add Code

Human-Instruction-Free LLM Self-Alignment with Limited Samples

no code implementations • 6 Jan 2024 • Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.

In-Context Learning Instruction Following

Paper
Add Code

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

no code implementations • 16 Feb 2024 • Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang

Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the objective and the constraint are both nonlinear functions of the visitation measure.

reinforcement-learning

Paper
Add Code

How Can LLM Guide RL? A Value-Based Approach

1 code implementation • 25 Feb 2024 • Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang

Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration.

Decision Making Reinforcement Learning (RL)

Paper
Code

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

no code implementations • 8 Mar 2024 • Hongyi Guo, Zhihan Liu, Yufeng Zhang, Zhaoran Wang

Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge.

Decision Making Hallucination

Paper
Add Code

A Mean-Field Analysis of Neural Gradient Descent-Ascent: Applications to Functional Conditional Moment Equations

no code implementations • 18 Apr 2024 • Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen

Under this regime, gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.