Search Results for author: Yaodong Yang

Found 88 papers, 36 papers with code

ProAgent: Building Proactive Cooperative AI with Large Language Models

no code implementations22 Aug 2023 Ceyao Zhang, Kaijie Yang, Siyi Hu, ZiHao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

Experimental evaluations conducted within the framework of \textit{Overcook-AI} unveil the remarkable performance superiority of ProAgent, outperforming five methods based on self-play and population-based training in cooperation with AI agents.

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

no code implementations9 Aug 2023 Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi.

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations24 Jul 2023 Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

Safe DreamerV3: Safe Reinforcement Learning with World Models

no code implementations14 Jul 2023 Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, Yaodong Yang

The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems.

reinforcement-learning Reinforcement Learning (RL) +1

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

1 code implementation10 Jul 2023 Jiaming Ji, Mickel Liu, Juntao Dai, Xuehai Pan, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, Yaodong Yang

In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs).

Question Answering

Policy Space Diversity for Non-Transitive Games

no code implementations29 Jun 2023 Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen Mcaleer, Qiang Fu, Wei Yang

To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE.

Large Sequence Models for Sequential Decision-Making: A Survey

no code implementations24 Jun 2023 Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang

Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.

Decision Making

Maximum Entropy Heterogeneous-Agent Mirror Learning

1 code implementation19 Jun 2023 Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yaodong Yang

Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.

Multi-agent Reinforcement Learning

Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork

1 code implementation19 Jun 2023 Yonggang Jin, Chenxu Wang, Liuyu Xiang, Yaodong Yang, Junge Zhang, Jie Fu, Zhaofeng He

Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities.

Decision Making Hippocampus +2

Heterogeneous Value Evaluation for Large Language Models

1 code implementation26 May 2023 Zhaowei Zhang, Nian Liu, Siyuan Qi, Ceyao Zhang, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang

In this paper, we propose A2EHV, an Automated Alignment Evaluation with a Heterogeneous Value system that (1) is automated to minimize individual human biases, and (2) allows assessments against various target values to foster heterogeneous agents.

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research

1 code implementation16 May 2023 Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang

AI systems empowered by reinforcement learning (RL) algorithms harbor the immense potential to catalyze societal advancement, yet their deployment is often impeded by significant safety concerns.

Philosophy reinforcement-learning +2

Heterogeneous-Agent Reinforcement Learning

1 code implementation19 Apr 2023 Yifan Zhong, Jakub Grudzien Kuba, Siyi Hu, Jiaming Ji, Yaodong Yang

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in AI research.

LEMMA Multi-agent Reinforcement Learning +1

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

no code implementations15 Apr 2023 Sirui Chen, Zhaowei Zhang, Yali Du, Yaodong Yang

Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning reinforcement-learning

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

no code implementations ICCV 2023 Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang

We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++.

ASP: Learn a Universal Neural Solver!

1 code implementation1 Mar 2023 Chenguang Wang, Zhouliang Yu, Stephen Mcaleer, Tianshu Yu, Yaodong Yang

Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy.

Combinatorial Optimization Traveling Salesman Problem

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

1 code implementation29 Nov 2022 Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Decision Making Q-Learning +2

Contextual Transformer for Offline Meta Reinforcement Learning

no code implementations15 Nov 2022 Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang

Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional policy generation.

D4RL Meta Reinforcement Learning +4

TorchOpt: An Efficient Library for Differentiable Optimization

1 code implementation13 Nov 2022 Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

TorchOpt further provides a high-performance distributed execution runtime.

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

1 code implementation11 Oct 2022 Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang

A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues.

Multi-agent Reinforcement Learning reinforcement-learning +1

GenDexGrasp: Generalizable Dexterous Grasping

1 code implementation3 Oct 2022 Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang

By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands.

MSRL: Distributed Reinforcement Learning with Dataflow Fragments

no code implementations3 Oct 2022 Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter Pietzuch, Lei Chen

Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e. g. policy network updates) on GPU workers.

reinforcement-learning Reinforcement Learning (RL)

End-to-End Affordance Learning for Robotic Manipulation

no code implementations26 Sep 2022 Yiran Geng, Boshi An, Haoran Geng, Yuanpei Chen, Yaodong Yang, Hao Dong

Such contact prediction process then leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks.

Reinforcement Learning (RL)

Constrained Update Projection Approach to Safe Policy Optimization

2 code implementations15 Sep 2022 Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan

Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance.

Reinforcement Learning (RL) Safe Reinforcement Learning

Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

no code implementations24 Aug 2022 Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao

Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations.

Fairness Information Retrieval +2

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

no code implementations2 Aug 2022 Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community.

Multi-agent Reinforcement Learning

Scalable Model-based Policy Optimization for Decentralized Networked Systems

2 code implementations13 Jul 2022 Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks.

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

1 code implementation17 Jun 2022 Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang

In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects.

Few-Shot Learning Offline RL +2

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

no code implementations30 May 2022 Oliver Slumbers, David Henry Mguni, Stephen Marcus McAleer, Stefano B. Blumberg, Jun Wang, Yaodong Yang

Although there are equilibrium concepts in game theory that take into account risk aversion, they either assume that agents are risk-neutral with respect to the uncertainty caused by the actions of other agents, or they are not guaranteed to exist.

Autonomous Driving Multi-agent Reinforcement Learning

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

1 code implementation30 May 2022 Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang

In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence.

Decision Making Multi-agent Reinforcement Learning +2

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

1 code implementation20 May 2022 Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, Alois Knoll

To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications.

Autonomous Driving Decision Making +3

On the Convergence of Fictitious Play: A Decomposition Approach

no code implementations3 May 2022 Yurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang

Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms.

Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

no code implementations10 Mar 2022 Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao

To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space.

Data Augmentation Reinforcement Learning (RL) +1

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

no code implementations10 Feb 2022 Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang

We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem but only one single round of communication is allowed and there is a budget constraint on the total number of information (in terms of bits) that each machine can send out.

Multi-Armed Bandits Offline RL +2

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

no code implementations10 Feb 2022 Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang

Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games.

Multi-agent Reinforcement Learning reinforcement-learning +1

Efficient Policy Space Response Oracles

no code implementations28 Jan 2022 Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu, Jun Wang

Policy Space Response Oracle methods (PSRO) provide a general solution to learn Nash equilibrium in two-player zero-sum games but suffer from two drawbacks: (1) the computation inefficiency due to the need for consistent meta-game evaluation via simulations, and (2) the exploration inefficiency due to finding the best response against a fixed meta-strategy at every epoch.

Efficient Exploration

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

1 code implementation31 Dec 2021 Bo Liu, Xidong Feng, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations.

Atari Games Meta Reinforcement Learning +3

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

1 code implementation6 Dec 2021 Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu

In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.

Offline RL reinforcement-learning +4

Neural Auto-Curricula in Two-Player Zero-Sum Games

no code implementations NeurIPS 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning Vocal Bursts Valence Prediction

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

1 code implementation NeurIPS 2021 Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers

no code implementations28 Oct 2021 Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande Guo, Haifeng Zhang, Jun Wang

In this paper, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator} to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).

Traveling Salesman Problem

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

no code implementations27 Oct 2021 David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang

In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy.

OpenAI Gym reinforcement-learning +3

Measuring the Non-Transitivity in Chess

no code implementations22 Oct 2021 Ricky Sanjaya, Jun Wang, Yaodong Yang

In this paper, we quantify the non-transitivity in Chess through real-world data from human players.

Online Markov Decision Processes with Non-oblivious Strategic Adversary

no code implementations7 Oct 2021 Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang

In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $\mathcal{O}(\sqrt{T \log(L)}+\tau^2\sqrt{ T \log(|A|)})$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size of agent's action space.

Multi-Agent Constrained Policy Optimisation

3 code implementations6 Oct 2021 Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang

To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.

Multi-agent Reinforcement Learning reinforcement-learning +1

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

no code implementations20 Sep 2021 Yixin Wu, Rui Luo, Chen Zhang, Jun Wang, Yaodong Yang

In this paper, we characterize the noise of stochastic gradients and analyze the noise-induced dynamics during training deep neural networks by gradient-based optimizers.

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

no code implementations4 Sep 2021 Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang

Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Settling the Variance of Multi-Agent Policy Gradients

1 code implementation NeurIPS 2021 Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang

In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.

Reinforcement Learning (RL) Starcraft

Is Nash Equilibrium Approximator Learnable?

no code implementations17 Aug 2021 Zhijian Duan, Wenhan Huang, Dinghuai Zhang, Yali Du, Jun Wang, Yaodong Yang, Xiaotie Deng

In this paper, we investigate the learnability of the function approximator that approximates Nash equilibrium (NE) for games generated from a distribution.

BIG-bench Machine Learning Meta-Learning +1

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

1 code implementation12 Jun 2021 Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration.

Atari Games Multi-agent Reinforcement Learning +2

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

no code implementations9 Jun 2021 Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

1 code implementation5 Jun 2021 Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Atari Games Distributed Computing +3

Neural Auto-Curricula

no code implementations4 Jun 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

no code implementations1 Jun 2021 Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.

Management Multi-agent Reinforcement Learning +3

Learning to Safely Exploit a Non-Stationary Opponent

no code implementations NeurIPS 2021 Zheng Tian, Hang Ren, Yaodong Yang, Yuchen Sun, Ziqi Han, Ian Davies, Jun Wang

On the other hand, overfitting to an opponent (i. e., exploiting only one specific type of opponent) makes the learning player easily exploitable by others.

Modelling Behavioural Diversity for Learning in Open-Ended Games

3 code implementations14 Mar 2021 Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e. g., Rock-Paper-Scissors).

Point Processes

Online Double Oracle

1 code implementation13 Mar 2021 Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.

Multi-Agent Trust Region Learning

1 code implementation1 Jan 2021 Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.

Atari Games Multi-agent Reinforcement Learning +3

Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium

no code implementations1 Jan 2021 Yizheng Hu, Kun Shao, Dong Li, Jianye Hao, Wulong Liu, Yaodong Yang, Jun Wang, Zhanxing Zhu

Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution.

Adversarial Robustness reinforcement-learning +2

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

no code implementations NeurIPS 2020 Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

1 code implementation1 Nov 2020 Yaodong Yang, Jun Wang

In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.

Multi-agent Reinforcement Learning reinforcement-learning +1

What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator

no code implementations NeurIPS 2021 Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.

Continuous Control Contrastive Learning +3

Learning to Infer User Hidden States for Online Sequential Advertising

no code implementations3 Sep 2020 Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Wei-Nan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important.

Multi-Agent Determinantal Q-Learning

1 code implementation ICML 2020 Yaodong Yang, Ying Wen, Li-Heng Chen, Jun Wang, Kun Shao, David Mguni, Wei-Nan Zhang

Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution.


$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

no code implementations25 Sep 2019 Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou Ammar

Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of $\mathcal{O}(2^{25})$ ($\approx 33$ million joint strategies) -- a setting not evaluable using $\alpha$-Rank with reasonable computational budget.

Stochastic Optimization

Bi-level Actor-Critic for Multi-agent Coordination

1 code implementation8 Sep 2019 Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Wei-Nan Zhang, Jun Wang

Coordination is one of the essential problems in multi-agent systems.

Multiagent Systems

Spectral-based Graph Convolutional Network for Directed Graphs

no code implementations21 Jul 2019 Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, Guangyong Chen

Our approach can work directly on directed graph data in semi-supervised nodes classification tasks.

Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets

no code implementations29 May 2019 Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

General Classification Image Classification

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

no code implementations ICLR 2019 Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan

Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge.

Multi-agent Reinforcement Learning reinforcement-learning +1

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

no code implementations26 Jan 2019 Ying Wen, Yaodong Yang, Rui Luo, Jun Wang

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem.

Decision Making Multi-agent Reinforcement Learning

Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting

no code implementations14 Dec 2018 Yaodong Yang, Alisa Kolesnikova, Stefan Lessmann, Tiejun Ma, Ming-Chien Sung, Johnnie E. V. Johnson

The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over machine learning and rule-based benchmarks.

BIG-bench Machine Learning Feature Engineering +1

Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series

no code implementations8 Nov 2018 Qiang Zhang, Rui Luo, Yaodong Yang, Yuanyuan Liu

As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities.

Benchmarking Decision Making +2

Factorized Q-Learning for Large-Scale Multi-Agent Systems

no code implementations11 Sep 2018 Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Wei-Nan Zhang, Dell Zhang, Jun Wang, Han Liu

Deep Q-learning has achieved a significant success in single-agent decision making tasks.

Multiagent Systems

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

1 code implementation NeurIPS 2018 Rui Luo, Jianhong Wang, Yaodong Yang, Zhanxing Zhu, Jun Wang

We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions.

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

no code implementations13 Sep 2017 Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Wei-Nan Zhang, Ying Wen, Yong Yu

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models

no code implementations16 Jun 2017 Yaodong Yang, Rui Luo, Yuanyuan Liu

Mixed models with random effects account for the covariance structure related to the grouping hierarchy in the data.

Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.