Search Results for author: Yaodong Yang

Found 105 papers, 45 papers with code

Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games

2 code implementations • 29 Mar 2017 • Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, Jun Wang

Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort.

Starcraft

108

Paper
Code

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models

no code implementations • 16 Jun 2017 • Yaodong Yang, Rui Luo, Yuanyuan Liu

Mixed models with random effects account for the covariance structure related to the grouping hierarchy in the data.

Variational Inference

Paper
Add Code

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

no code implementations • 13 Sep 2017 • Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Wei-Nan Zhang, Ying Wen, Yong Yu

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

1 code implementation • NeurIPS 2018 • Rui Luo, Jianhong Wang, Yaodong Yang, Zhanxing Zhu, Jun Wang

We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions.

Paper
Code

Mean Field Multi-Agent Reinforcement Learning

3 code implementations • ICML 2018 • Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Wei-Nan Zhang, Jun Wang

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents.

Multi-agent Reinforcement Learning Q-Learning +2

363

Paper
Code

Factorized Q-Learning for Large-Scale Multi-Agent Systems

no code implementations • 11 Sep 2018 • Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Wei-Nan Zhang, Dell Zhang, Jun Wang, Han Liu

Deep Q-learning has achieved a significant success in single-agent decision making tasks.

Multiagent Systems

Paper
Add Code

Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series

no code implementations • 8 Nov 2018 • Qiang Zhang, Rui Luo, Yaodong Yang, Yuanyuan Liu

As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities.

Benchmarking Decision Making +2

Paper
Add Code

Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting

no code implementations • 14 Dec 2018 • Yaodong Yang, Alisa Kolesnikova, Stefan Lessmann, Tiejun Ma, Ming-Chien Sung, Johnnie E. V. Johnson

The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over machine learning and rule-based benchmarks.

BIG-bench Machine Learning Feature Engineering +1

Paper
Add Code

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

no code implementations • 26 Jan 2019 • Ying Wen, Yaodong Yang, Rui Luo, Jun Wang

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem.

Decision Making Multi-agent Reinforcement Learning

Paper
Add Code

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

no code implementations • ICLR 2019 • Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan

Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction

no code implementations • 27 May 2019 • Hongyao Tang, Jianye Hao, Guangyong Chen, Pengfei Chen, Zhaopeng Meng, Yaodong Yang, Li Wang

Value functions are crucial for model-free Reinforcement Learning (RL) to obtain a policy implicitly or guide the policy updates.

Continuous Control Future prediction +1

Paper
Add Code

Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets

no code implementations • 29 May 2019 • Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

General Classification Image Classification

Paper
Add Code

Spectral-based Graph Convolutional Network for Directed Graphs

no code implementations • 21 Jul 2019 • Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, Guangyong Chen

Our approach can work directly on directed graph data in semi-supervised nodes classification tasks.

Paper
Add Code

Bi-level Actor-Critic for Multi-agent Coordination

1 code implementation • 8 Sep 2019 • Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Wei-Nan Zhang, Jun Wang

Coordination is one of the essential problems in multi-agent systems.

Multiagent Systems

Paper
Code

$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

no code implementations • 25 Sep 2019 • Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou Ammar

Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of $\mathcal{O}(2^{25})$ ($\approx 33$ million joint strategies) -- a setting not evaluable using $\alpha$-Rank with reasonable computational budget.

Stochastic Optimization

Paper
Add Code

Multi-Agent Determinantal Q-Learning

1 code implementation • ICML 2020 • Yaodong Yang, Ying Wen, Li-Heng Chen, Jun Wang, Kun Shao, David Mguni, Wei-Nan Zhang

Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution.

Q-Learning

Paper
Code

Learning to Infer User Hidden States for Online Sequential Advertising

no code implementations • 3 Sep 2020 • Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Wei-Nan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important.

Paper
Add Code

What About Taking Policy as Input of Value Function: Policy-extended Value Function Approximator

no code implementations • 28 Sep 2020 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Wulong Liu, Yaodong Yang

The value function lies in the heart of Reinforcement Learning (RL), which defines the long-term evaluation of a policy in a given state.

Continuous Control Contrastive Learning +2

Paper
Add Code

What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator

1 code implementation • NeurIPS 2021 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.

Continuous Control Contrastive Learning +3

Paper
Code

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

3 code implementations • 19 Oct 2020 • Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham Bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang

We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving.

Autonomous Driving Multi-agent Reinforcement Learning +2

878

Paper
Code

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

1 code implementation • 1 Nov 2020 • Yaodong Yang, Jun Wang

In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

no code implementations • NeurIPS 2020 • Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

Paper
Add Code

Multi-Agent Trust Region Learning

1 code implementation • 1 Jan 2021 • Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.

Atari Games Multi-agent Reinforcement Learning +3

Paper
Code

Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium

no code implementations • 1 Jan 2021 • Yizheng Hu, Kun Shao, Dong Li, Jianye Hao, Wulong Liu, Yaodong Yang, Jun Wang, Zhanxing Zhu

Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution.

Adversarial Robustness reinforcement-learning +2

Paper
Add Code

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

no code implementations • 15 Feb 2021 • Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor

Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games.

Autonomous Driving

Paper
Add Code

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

1 code implementation • 3 Mar 2021 • Hongyao Tang, Jianye Hao, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Zhaopeng Meng

Value function is the central notion of Reinforcement Learning (RL).

Continuous Control Future prediction +2

Paper
Code

Online Double Oracle

1 code implementation • 13 Mar 2021 • Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.

Paper
Code

Modelling Behavioural Diversity for Learning in Open-Ended Games

3 code implementations • 14 Mar 2021 • Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e. g., Rock-Paper-Scissors).

Point Processes

Paper
Code

Learning to Shape Rewards using a Game of Two Partners

no code implementations • 16 Mar 2021 • David Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves, Tianpei Yang, Matthew Taylor, Wenbin Song, Feifei Tong, Hui Chen, Jiangcheng Zhu, Jun Wang, Yaodong Yang

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Learning to Safely Exploit a Non-Stationary Opponent

no code implementations • NeurIPS 2021 • Zheng Tian, Hang Ren, Yaodong Yang, Yuchen Sun, Ziqi Han, Ian Davies, Jun Wang

On the other hand, overfitting to an opponent (i. e., exploiting only one specific type of opponent) makes the learning player easily exploitable by others.

Paper
Add Code

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

no code implementations • 1 Jun 2021 • Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.

Management Multi-agent Reinforcement Learning +3

Paper
Add Code

Neural Auto-Curricula

1 code implementation • 4 Jun 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Paper
Code

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

1 code implementation • 5 Jun 2021 • Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Atari Games Distributed Computing +3

461

Paper
Code

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

no code implementations • 9 Jun 2021 • Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

Paper
Add Code

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

1 code implementation • 12 Jun 2021 • Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration.

Atari Games Multi-agent Reinforcement Learning +2

Paper
Code

Is Nash Equilibrium Approximator Learnable?

no code implementations • 17 Aug 2021 • Zhijian Duan, Wenhan Huang, Dinghuai Zhang, Yali Du, Jun Wang, Yaodong Yang, Xiaotie Deng

In this paper, we investigate the learnability of the function approximator that approximates Nash equilibrium (NE) for games generated from a distribution.

BIG-bench Machine Learning Meta-Learning +1

Paper
Add Code

Settling the Variance of Multi-Agent Policy Gradients

1 code implementation • NeurIPS 2021 • Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang

In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.

Reinforcement Learning (RL) Starcraft

Paper
Code

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

no code implementations • 4 Sep 2021 • Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang

Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

no code implementations • 20 Sep 2021 • Yixin Wu, Rui Luo, Chen Zhang, Jun Wang, Yaodong Yang

In this paper, we characterize the noise of stochastic gradients and analyze the noise-induced dynamics during training deep neural networks by gradient-based optimizers.

Paper
Add Code

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

7 code implementations • ICLR 2022 • Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang

In this paper, we extend the theory of trust region learning to MARL.

LEMMA Multi-agent Reinforcement Learning +2

2,534

Paper
Code

Offline Pre-trained Multi-Agent Decision Transformer

no code implementations • 29 Sep 2021 • Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xi yun Li, Haifeng Zhang, Ying Wen, Weinan Zhang, Jun Wang, Bo Xu

Offline reinforcement learning leverages static datasets to learn optimal policies with no necessity to access the environment.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Multi-Agent Constrained Policy Optimisation

3 code implementations • 6 Oct 2021 • Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang

To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.

Multi-agent Reinforcement Learning reinforcement-learning +1

124

Paper
Code

Online Markov Decision Processes with Non-oblivious Strategic Adversary

no code implementations • 7 Oct 2021 • Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang

In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $\mathcal{O}(\sqrt{T \log(L)}+\tau^2\sqrt{ T \log(|A|)})$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size of agent's action space.

Paper
Add Code

Measuring the Non-Transitivity in Chess

no code implementations • 22 Oct 2021 • Ricky Sanjaya, Jun Wang, Yaodong Yang

In this paper, we quantify the non-transitivity in Chess through real-world data from human players.

Paper
Add Code

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

no code implementations • 27 Oct 2021 • David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang

In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy.

OpenAI Gym reinforcement-learning +3

Paper
Add Code

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers

no code implementations • 28 Oct 2021 • Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande Guo, Haifeng Zhang, Jun Wang

In this paper, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator} to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).

Traveling Salesman Problem

Paper
Add Code

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

1 code implementation • NeurIPS 2021 • Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

Paper
Code

Neural Auto-Curricula in Two-Player Zero-Sum Games

1 code implementation • NeurIPS 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

Multi-agent Reinforcement Learning Vocal Bursts Valence Prediction

Paper
Code

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

1 code implementation • 6 Dec 2021 • Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu

In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.

Offline RL reinforcement-learning +4

Paper
Code

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

1 code implementation • 31 Dec 2021 • Xidong Feng, Bo Liu, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations.

Atari Games Meta Reinforcement Learning +3

Paper
Code

Efficient Policy Space Response Oracles

no code implementations • 28 Jan 2022 • Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu, Jun Wang

Policy Space Response Oracle methods (PSRO) provide a general solution to learn Nash equilibrium in two-player zero-sum games but suffer from two drawbacks: (1) the computation inefficiency due to the need for consistent meta-game evaluation via simulations, and (2) the exploration inefficiency due to finding the best response against a fixed meta-strategy at every epoch.

Efficient Exploration

Paper
Add Code

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

no code implementations • 10 Feb 2022 • Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang

Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

no code implementations • 10 Feb 2022 • Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang

We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem but only one single round of communication is allowed and there is a budget constraint on the total number of information (in terms of bits) that each machine can send out.

Multi-Armed Bandits Offline RL +2

Paper
Add Code

Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

no code implementations • 10 Mar 2022 • Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao

To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space.

Data Augmentation Reinforcement Learning (RL) +1

Paper
Add Code

On the Convergence of Fictitious Play: A Decomposition Approach

no code implementations • 3 May 2022 • Yurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang

Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms.

Paper
Add Code

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

1 code implementation • 20 May 2022 • Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, Alois Knoll

To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications.

Autonomous Driving Decision Making +3

400

Paper
Code

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

1 code implementation • 30 May 2022 • Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang

In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence.

Decision Making Multi-agent Reinforcement Learning +2

273

Paper
Code

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

no code implementations • 30 May 2022 • Oliver Slumbers, David Henry Mguni, Stephen Marcus McAleer, Stefano B. Blumberg, Jun Wang, Yaodong Yang

Although there are equilibrium concepts in game theory that take into account risk aversion, they either assume that agents are risk-neutral with respect to the uncertainty caused by the actions of other agents, or they are not guaranteed to exist.

Autonomous Driving Multi-agent Reinforcement Learning

Paper
Add Code

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

1 code implementation • 17 Jun 2022 • Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang

In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects.

Few-Shot Learning Offline RL +2

518

Paper
Code

Scalable Model-based Policy Optimization for Decentralized Networked Systems

2 code implementations • 13 Jul 2022 • Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks.

Paper
Code

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

no code implementations • 2 Aug 2022 • Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community.

Multi-agent Reinforcement Learning

Paper
Add Code

Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

no code implementations • 24 Aug 2022 • Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao

Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations.

Fairness Information Retrieval +2

Paper
Add Code

Constrained Update Projection Approach to Safe Policy Optimization

3 code implementations • 15 Sep 2022 • Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan

Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Code

End-to-End Affordance Learning for Robotic Manipulation

1 code implementation • 26 Sep 2022 • Yiran Geng, Boshi An, Haoran Geng, Yuanpei Chen, Yaodong Yang, Hao Dong

Such contact prediction process then leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks.

Reinforcement Learning (RL)

Paper
Code

MSRL: Distributed Reinforcement Learning with Dataflow Fragments

no code implementations • 3 Oct 2022 • Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter Pietzuch, Lei Chen

Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e. g. policy network updates) on GPU workers.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

GenDexGrasp: Generalizable Dexterous Grasping

1 code implementation • 3 Oct 2022 • Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang

By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands.

Paper
Code

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

1 code implementation • 11 Oct 2022 • Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang

A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues.

Multi-agent Reinforcement Learning reinforcement-learning +1

768

Paper
Code

TorchOpt: An Efficient Library for Differentiable Optimization

1 code implementation • 13 Nov 2022 • Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

TorchOpt further provides a high-performance distributed execution runtime.

496

Paper
Code

Contextual Transformer for Offline Meta Reinforcement Learning

no code implementations • 15 Nov 2022 • Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang

Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional policy generation.

D4RL Meta Reinforcement Learning +4

Paper
Add Code

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

1 code implementation • 29 Nov 2022 • Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Ranked #1 on SMAC on SMAC 3s5z_vs_3s6z

Decision Making Q-Learning +2

179

Paper
Code

ASP: Learn a Universal Neural Solver!

1 code implementation • 1 Mar 2023 • Chenguang Wang, Zhouliang Yu, Stephen Mcaleer, Tianshu Yu, Yaodong Yang

Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy.

Combinatorial Optimization Traveling Salesman Problem

Paper
Code

DPPMask: Masked Image Modeling with Determinantal Point Processes

no code implementations • 13 Mar 2023 • Junde Xu, Zikai Lin, Donghao Zhou, Yaodong Yang, Xiangyun Liao, Bian Wu, Guangyong Chen, Pheng-Ann Heng

In particular, we evaluate our method on two representative MIM frameworks, MAE and iBOT.

Multi-class Classification Point Processes

Paper
Add Code

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

no code implementations • ICCV 2023 • Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang

We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++.

Object

Paper
Add Code

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

1 code implementation • 15 Apr 2023 • Sirui Chen, Zhaowei Zhang, Yaodong Yang, Yali Du

It first decomposes the global return back to each time step, then utilizes the Shapley Value to redistribute the individual payoff from the decomposed global reward.

Multi-agent Reinforcement Learning reinforcement-learning

Paper
Code

Heterogeneous-Agent Reinforcement Learning

1 code implementation • 19 Apr 2023 • Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in AI research.

LEMMA Multi-agent Reinforcement Learning +1

345

Paper
Code

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research

1 code implementation • 16 May 2023 • Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang

AI systems empowered by reinforcement learning (RL) algorithms harbor the immense potential to catalyze societal advancement, yet their deployment is often impeded by significant safety concerns.

Philosophy reinforcement-learning +2

838

Paper
Code

Heterogeneous Value Alignment Evaluation for Large Language Models

2 code implementations • 26 May 2023 • Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang

We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values.

Attribute

Paper
Code

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

1 code implementation • 19 Jun 2023 • Yonggang Jin, Chenxu Wang, Tianyu Zheng, Liuyu Xiang, Yaodong Yang, Junge Zhang, Jie Fu, Zhaofeng He

Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities.

Decision Making Hippocampus +2

Paper
Code

Maximum Entropy Heterogeneous-Agent Reinforcement Learning

1 code implementation • 19 Jun 2023 • Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yaodong Yang

We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective for MARL.

Multi-agent Reinforcement Learning reinforcement-learning +1

345

Paper
Code

Large Sequence Models for Sequential Decision-Making: A Survey

no code implementations • 24 Jun 2023 • Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang

Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.

Decision Making

Paper
Add Code

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

no code implementations • NeurIPS 2023 • Jiaming Ji, Mickel Liu, Juntao Dai, Xuehai Pan, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, Yaodong Yang

In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs).

Question Answering

Paper
Add Code

SafeDreamer: Safe Reinforcement Learning with World Models

no code implementations • 14 Jul 2023 • Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, Yaodong Yang

Existing Safe Reinforcement Learning (SafeRL) methods, which rely on cost functions to enforce safety, often fail to achieve zero-cost performance in complex scenarios, especially vision-only tasks.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations • 24 Jul 2023 • Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

Paper
Add Code

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

no code implementations • 9 Aug 2023 • Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi.

Paper
Add Code

ProAgent: Building Proactive Cooperative Agents with Large Language Models

no code implementations • 22 Aug 2023 • Ceyao Zhang, Kaijie Yang, Siyi Hu, ZiHao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems.

Paper
Add Code

Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators

1 code implementation • 29 Aug 2023 • Jingbang Chen, Yian Wang, Xingwei Qu, Shuangjia Zheng, Yaodong Yang, Hao Dong, Jie Fu

Molecular dynamics simulations have emerged as a fundamental instrument for studying biomolecules.

Meta-Learning

Paper
Code

Dynamic Handover: Throw and Catch with Bimanual Hands

no code implementations • 11 Sep 2023 • Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, Xiaolong Wang

Humans throw and catch objects all the time.

Multi-agent Reinforcement Learning Trajectory Prediction

Paper
Add Code

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

no code implementations • 30 Sep 2023 • Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang

We argue that truly understanding values in LLMs requires considering both "know what" and "know why".

Paper
Add Code

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

no code implementations • 30 Sep 2023 • Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yang

In this paper, we present Red-teaming Game (RTG), a general game-theoretic framework without manual annotation.

Language Modelling Vulnerability Detection

Paper
Add Code

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

1 code implementation • 8 Oct 2023 • Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers).

Reinforcement Learning (RL)

Paper
Code

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

no code implementations • 15 Oct 2023 • Simin Li, Ruixiao Xu, Jun Guo, Pu Feng, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu, Weifeng Lv

Existing max-min optimization techniques in robust MARL seek to enhance resilience by training agents against worst-case adversaries, but this becomes intractable as the number of agents grows, leading to exponentially increasing worst-case scenarios.

Multi-agent Reinforcement Learning Starcraft +1

Paper
Add Code

MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning

no code implementations • 18 Oct 2023 • Jie Liu, Yinmin Zhang, Chuming Li, Chao Yang, Yaodong Yang, Yu Liu, Wanli Ouyang

Building a single generalist agent with strong zero-shot capability has recently sparked significant advancements.

Decision Making SMAC+

Paper
Add Code

Safe RLHF: Safe Reinforcement Learning from Human Feedback

1 code implementation • 19 Oct 2023 • Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.

reinforcement-learning Safe Reinforcement Learning

1,149

Paper
Code

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

no code implementations • 19 Oct 2023 • Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang

By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications.

reinforcement-learning Safe Reinforcement Learning

Paper
Add Code

Grasp Multiple Objects with One Hand

1 code implementation • 24 Oct 2023 • Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang

The intricate kinematics of the human hand enable simultaneous grasping and manipulation of multiple objects, essential for tasks such as object transfer and in-hand manipulation.

Object

Paper
Code

AI Alignment: A Comprehensive Survey

no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Paper
Add Code

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

no code implementations • 10 Nov 2023 • ZiHao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents.

Paper
Add Code

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

1 code implementation • 12 Dec 2023 • Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

Offline RL

2,534

Paper
Code

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

1 code implementation • 19 Jan 2024 • Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu

Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning.

Decision Making

Paper
Code

Panacea: Pareto Alignment via Preference Adaptation for LLMs

no code implementations • 3 Feb 2024 • Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Qingfu Zhang, Siyuan Qi, Yaodong Yang

Our work marks a step forward in effectively and efficiently aligning models to diverse and intricate human preferences in a controllable and Pareto-optimal manner.

Language Modelling Large Language Model

Paper
Add Code

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

no code implementations • 4 Feb 2024 • Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang

Here we introduce Aligner, a new efficient alignment paradigm that bypasses the whole RLHF process by learning the correctional residuals between the aligned and the unaligned answers.

Paper
Add Code

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

no code implementations • 15 Feb 2024 • Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

Then, based on this framework, we introduce the IBN to analyze generalization in the reward modeling stage of RLHF.

Language Modelling Large Language Model

Paper
Add Code

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects

no code implementations • 20 Feb 2024 • Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang

The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety.

Paper
Add Code

INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations

no code implementations • 19 Mar 2024 • Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li

In this paper, we present a framework that is capable of learning structured states and symbolic policies simultaneously, whose key idea is to overcome the efficiency bottleneck by distilling vision foundation models into a scalable perception module.

Decision Making

Paper
Add Code

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

no code implementations • 19 Mar 2024 • Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang

Traditional approaches in physics-based motion generation, centered around imitation learning and reward shaping, often struggle to adapt to new scenarios.

Imitation Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.