Search Results for author: Yaodong Yang

Found 55 papers, 21 papers with code

On the Convergence of Fictitious Play: A Decomposition Approach

no code implementations3 May 2022 Yurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang

Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms.

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

no code implementations10 Feb 2022 Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang

Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games.

Multi-agent Reinforcement Learning reinforcement-learning

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

no code implementations10 Feb 2022 Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang

We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem but only one single round of communication is allowed and there is a budget constraint on the total number of information (in terms of bits) that each machine can send out.

Multi-Armed Bandits Offline RL +1

Efficient Policy Space Response Oracles

no code implementations28 Jan 2022 Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu

Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games but suffers from two problems: (1) the computation inefficiency due to consistently evaluating current populations by simulations; and (2) the exploration inefficiency due to learning best responses against a fixed meta-strategy at each iteration.

Efficient Exploration

Settling the Bias and Variance of Meta-Gradient Estimation for Meta-Reinforcement Learning

no code implementations31 Dec 2021 Bo Liu, Xidong Feng, Haifeng Zhang, Jun Wang, Yaodong Yang

In recent years, gradient based Meta-RL (GMRL) methods have achieved remarkable successes in either discovering effective online hyperparameter for one single task (Xu et al., 2018) or learning good initialisation for multi-task transfer learning (Finn et al., 2017).

Meta Reinforcement Learning reinforcement-learning +1

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

1 code implementation6 Dec 2021 Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu

In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.

Offline RL reinforcement-learning +3

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

1 code implementation NeurIPS 2021 Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

Neural Auto-Curricula in Two-Player Zero-Sum Games

1 code implementation NeurIPS 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers

no code implementations28 Oct 2021 Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande Guo, Haifeng Zhang, Jun Wang

In this paper, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator} to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).

Traveling Salesman Problem

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

no code implementations27 Oct 2021 David Mguni, Joel Jennings, Taher Jafferjee, Aivar Sootla, Yaodong Yang, Changmin Yu, Usman Islam, Ziyan Wang, Jun Wang

The core of DESTA is a novel game between two RL agents: SAFETY AGENT that is delegated the task of minimising safety violations and TASK AGENT whose goal is to maximise the reward set by the environment task.

reinforcement-learning Safe Exploration +1

Measuring the Non-Transitivity in Chess

no code implementations22 Oct 2021 Ricky Sanjaya, Jun Wang, Yaodong Yang

In this paper, we quantify the non-transitivity in Chess through real-world data from human players.

Online Markov Decision Processes with Non-oblivious Strategic Adversary

no code implementations7 Oct 2021 Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang

In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $\mathcal{O}(\sqrt{T \log(L)}+\tau^2\sqrt{ T \log(|A|)})$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size of agent's action space.

Multi-Agent Constrained Policy Optimisation

1 code implementation6 Oct 2021 Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang

To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.

Multi-agent Reinforcement Learning reinforcement-learning

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

no code implementations20 Sep 2021 Yixin Wu, Rui Luo, Chen Zhang, Jun Wang, Yaodong Yang

In this paper, we characterize the noise of stochastic gradients and analyze the noise-induced dynamics during training deep neural networks by gradient-based optimizers.

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

no code implementations4 Sep 2021 Xiaotie Deng, Yuhao Li, David Henry Mguni, Jun Wang, Yaodong Yang

Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.

Multi-agent Reinforcement Learning reinforcement-learning

Settling the Variance of Multi-Agent Policy Gradients

1 code implementation NeurIPS 2021 Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang

In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.

Starcraft

Towards the PAC Learnability of Nash Equilibrium

no code implementations17 Aug 2021 Zhijian Duan, Dinghuai Zhang, Wenhan Huang, Yali Du, Jun Wang, Yaodong Yang, Xiaotie Deng

Nash equilibrium (NE) is one of the most important solution concepts in game theory and has broad applications in machine learning research.

Meta-Learning

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

1 code implementation12 Jun 2021 Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration.

Atari Games Multi-agent Reinforcement Learning +1

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

no code implementations9 Jun 2021 Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

1 code implementation5 Jun 2021 Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Atari Games Distributed Computing +2

Neural Auto-Curricula

1 code implementation4 Jun 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

no code implementations1 Jun 2021 Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.

Multi-agent Reinforcement Learning Starcraft +2

Learning to Safely Exploit a Non-Stationary Opponent

no code implementations NeurIPS 2021 Zheng Tian, Hang Ren, Yaodong Yang, Yuchen Sun, Ziqi Han, Ian Davies, Jun Wang

On the other hand, overfitting to an opponent (i. e., exploiting only one specific type of opponent) makes the learning player easily exploitable by others.

Learning to Shape Rewards using a Game of Two Partners

no code implementations16 Mar 2021 David Mguni, Jianhong Wang, Taher Jafferjee, Nicolas Perez-Nieves, Wenbin Song, Yaodong Yang, Feifei Tong, Hui Chen, Jiangcheng Zhu, Jun Wang

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards.

reinforcement-learning

Modelling Behavioural Diversity for Learning in Open-Ended Games

3 code implementations14 Mar 2021 Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e. g., Rock-Paper-Scissors).

Point Processes

Online Double Oracle

1 code implementation13 Mar 2021 Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.

online learning

Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium

no code implementations1 Jan 2021 Yizheng Hu, Kun Shao, Dong Li, Jianye Hao, Wulong Liu, Yaodong Yang, Jun Wang, Zhanxing Zhu

Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution.

Adversarial Robustness reinforcement-learning +1

Multi-Agent Trust Region Learning

1 code implementation1 Jan 2021 Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.

Atari Games Multi-agent Reinforcement Learning +2

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

no code implementations NeurIPS 2020 Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

1 code implementation1 Nov 2020 Yaodong Yang, Jun Wang

In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.

Multi-agent Reinforcement Learning reinforcement-learning

What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator

no code implementations NeurIPS 2021 Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.

Continuous Control Contrastive Learning +2

Learning to Infer User Hidden States for Online Sequential Advertising

no code implementations3 Sep 2020 Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Wei-Nan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important.

reinforcement-learning

Multi-Agent Determinantal Q-Learning

1 code implementation ICML 2020 Yaodong Yang, Ying Wen, Li-Heng Chen, Jun Wang, Kun Shao, David Mguni, Wei-Nan Zhang

Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution.

Q-Learning

$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

no code implementations25 Sep 2019 Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou Ammar

Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of $\mathcal{O}(2^{25})$ ($\approx 33$ million joint strategies) -- a setting not evaluable using $\alpha$-Rank with reasonable computational budget.

Stochastic Optimization

Bi-level Actor-Critic for Multi-agent Coordination

1 code implementation8 Sep 2019 Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Wei-Nan Zhang, Jun Wang

Coordination is one of the essential problems in multi-agent systems.

Multiagent Systems

Spectral-based Graph Convolutional Network for Directed Graphs

no code implementations21 Jul 2019 Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, Guangyong Chen

Our approach can work directly on directed graph data in semi-supervised nodes classification tasks.

Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets

no code implementations29 May 2019 Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

General Classification Image Classification

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

no code implementations26 Jan 2019 Ying Wen, Yaodong Yang, Rui Luo, Jun Wang

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem.

Decision Making Multi-agent Reinforcement Learning

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

no code implementations ICLR 2019 Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan

Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge.

Multi-agent Reinforcement Learning reinforcement-learning

Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting

no code implementations14 Dec 2018 Yaodong Yang, Alisa Kolesnikova, Stefan Lessmann, Tiejun Ma, Ming-Chien Sung, Johnnie E. V. Johnson

The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over machine learning and rule-based benchmarks.

Feature Engineering

Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series

no code implementations8 Nov 2018 Qiang Zhang, Rui Luo, Yaodong Yang, Yuanyuan Liu

As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities.

Decision Making Time Series

Factorized Q-Learning for Large-Scale Multi-Agent Systems

no code implementations11 Sep 2018 Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Wei-Nan Zhang, Dell Zhang, Jun Wang, Han Liu

Deep Q-learning has achieved a significant success in single-agent decision making tasks.

Multiagent Systems

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

1 code implementation NeurIPS 2018 Rui Luo, Jianhong Wang, Yaodong Yang, Zhanxing Zhu, Jun Wang

We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions.

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

no code implementations13 Sep 2017 Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Wei-Nan Zhang, Ying Wen, Yong Yu

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning.

reinforcement-learning

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models

no code implementations16 Jun 2017 Yaodong Yang, Rui Luo, Yuanyuan Liu

Mixed models with random effects account for the covariance structure related to the grouping hierarchy in the data.

Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.