Search Results for author: Yaodong Yang

Found 116 papers, 51 papers with code

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

no code implementations30 Aug 2024 Jiayi Zhou, Jiaming Ji, Juntao Dai, Yaodong Yang

Reinforcement learning from human feedback (RLHF) aligns LLMs by training a reward model (RM) on human preferences and fine-tuning the LLMs to maximize RM feedback.

Text Summarization

A Survey on Self-play Methods in Reinforcement Learning

no code implementations2 Aug 2024 Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang

Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning.

Multi-agent Reinforcement Learning reinforcement-learning

ProgressGym: Alignment with a Millennium of Moral Progress

no code implementations28 Jun 2024 Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users.

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

1 code implementation20 Jun 2024 Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the SafeSora dataset to promote research on aligning text-to-video generation with human values.

Safety Alignment Text-to-Video Generation +2

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

no code implementations20 Jun 2024 Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs).

Question Answering Safety Alignment

In-Context Editing: Learning Knowledge from Self-Induced Distributions

1 code implementation17 Jun 2024 Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng

The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining.

In-Context Learning knowledge editing +1

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

no code implementations15 Jun 2024 Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen

Federated learning (FL) enables multiple parties to collaboratively fine-tune an large language model (LLM) without the need of direct data sharing.

Federated Learning Language Modelling +2

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

no code implementations12 Jun 2024 Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng

To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments.

Decision Making Multi-agent Reinforcement Learning

Language Models Resist Alignment

1 code implementation10 Jun 2024 Jiaming Ji, Kaile Wang, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Yaodong Yang

Empirically, we demonstrate the elasticity of post-alignment models, i. e., the tendency to revert to the behavior distribution formed during the pre-training phase upon further fine-tuning.

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

no code implementations29 May 2024 Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han

To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.

reinforcement-learning

Efficient Model-agnostic Alignment via Bayesian Persuasion

no code implementations29 May 2024 Fengshuo Bai, Mingzhi Wang, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang

This paper explores an efficient method for aligning black-box large models using smaller models, introducing a model-agnostic and lightweight Bayesian Persuasion Alignment framework.

Code Generation Mathematical Reasoning

End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations

1 code implementation19 Mar 2024 Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li

Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies.

Decision Making reinforcement-learning

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

no code implementations CVPR 2024 Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang

Traditional approaches in physics-based motion generation, centered around imitation learning and reward shaping, often struggle to adapt to new scenarios.

Imitation Learning

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects

no code implementations20 Feb 2024 Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang

The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety.

Reward Generalization in RLHF: A Topological Perspective

no code implementations15 Feb 2024 Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

As a solution, we introduce a theoretical framework for investigating reward generalization in reinforcement learning from human feedback (RLHF), focusing on the topology of information flow at both macro and micro levels.

Generalization Bounds Language Modelling +1

Aligner: Efficient Alignment by Learning to Correct

no code implementations4 Feb 2024 Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang

However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints.

Hallucination

Panacea: Pareto Alignment via Preference Adaptation for LLMs

no code implementations3 Feb 2024 Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Haojun Chen, Qingfu Zhang, Siyuan Qi, Yaodong Yang

Panacea trains a single model capable of adapting online and Pareto-optimally to diverse sets of preferences without the need for further tuning.

Language Modelling Large Language Model

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

1 code implementation19 Jan 2024 Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu

Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning.

Decision Making

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

1 code implementation12 Dec 2023 Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

Offline RL

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

no code implementations10 Nov 2023 ZiHao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents.

AI Alignment: A Comprehensive Survey

no code implementations30 Oct 2023 Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Grasp Multiple Objects with One Hand

1 code implementation24 Oct 2023 Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang

The intricate kinematics of the human hand enable simultaneous grasping and manipulation of multiple objects, essential for tasks such as object transfer and in-hand manipulation.

Object

Safe RLHF: Safe Reinforcement Learning from Human Feedback

1 code implementation19 Oct 2023 Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.

reinforcement-learning Safe Reinforcement Learning

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

no code implementations19 Oct 2023 Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang

By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications.

reinforcement-learning Safe Reinforcement Learning

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

no code implementations15 Oct 2023 Simin Li, Ruixiao Xu, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Yaodong Yang, Xianglong Liu

Existing robust MARL methods either approximate or enumerate all possible threat scenarios against worst-case adversaries, leading to computational intensity and reduced robustness.

Multi-agent Reinforcement Learning Off-policy evaluation +3

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

1 code implementation8 Oct 2023 Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers).

Reinforcement Learning (RL)

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games

no code implementations30 Sep 2023 Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang

Furthermore, we develop a Gamified Red Team Solver (GRTS) with diversity measures to mitigate mode collapse and theoretically guarantee the convergence of approximate Nash equilibrium which results in better strategies for both teams.

Diversity Language Modelling +2

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

no code implementations9 Aug 2023 Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi.

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations24 Jul 2023 Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

SafeDreamer: Safe Reinforcement Learning with World Models

1 code implementation14 Jul 2023 Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, Yaodong Yang

Existing Safe Reinforcement Learning (SafeRL) methods, which rely on cost functions to enforce safety, often fail to achieve zero-cost performance in complex scenarios, especially vision-only tasks.

reinforcement-learning Reinforcement Learning (RL) +1

Large Sequence Models for Sequential Decision-Making: A Survey

no code implementations24 Jun 2023 Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang

Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.

Decision Making

Maximum Entropy Heterogeneous-Agent Reinforcement Learning

1 code implementation19 Jun 2023 Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yaodong Yang

We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective for MARL.

Multi-agent Reinforcement Learning reinforcement-learning +1

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

1 code implementation19 Jun 2023 Yonggang Jin, Chenxu Wang, Tianyu Zheng, Liuyu Xiang, Yaodong Yang, Junge Zhang, Jie Fu, Zhaofeng He

Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities.

Decision Making Hippocampus +2

Heterogeneous Value Alignment Evaluation for Large Language Models

2 code implementations26 May 2023 Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang

We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values.

Attribute

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research

1 code implementation16 May 2023 Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang

AI systems empowered by reinforcement learning (RL) algorithms harbor the immense potential to catalyze societal advancement, yet their deployment is often impeded by significant safety concerns.

Philosophy reinforcement-learning +2

Heterogeneous-Agent Reinforcement Learning

1 code implementation19 Apr 2023 Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in AI research.

LEMMA Multi-agent Reinforcement Learning +1

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

1 code implementation15 Apr 2023 Sirui Chen, Zhaowei Zhang, Yaodong Yang, Yali Du

It first decomposes the global return back to each time step, then utilizes the Shapley Value to redistribute the individual payoff from the decomposed global reward.

Multi-agent Reinforcement Learning reinforcement-learning

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

no code implementations ICCV 2023 Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang

We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++.

Object

ASP: Learn a Universal Neural Solver!

1 code implementation1 Mar 2023 Chenguang Wang, Zhouliang Yu, Stephen Mcaleer, Tianshu Yu, Yaodong Yang

Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy.

Combinatorial Optimization Traveling Salesman Problem

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

1 code implementation7 Feb 2023 Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Aishan Liu, Yaodong Yang, Bo An, Wenjun Wu, Xianglong Liu

To achieve maximum deviation in victim policies under complex agent-wise interactions, our unilateral attack aims to characterize and maximize the impact of the adversary on the victims.

Continuous Control reinforcement-learning +4

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

1 code implementation29 Nov 2022 Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Decision Making Q-Learning +2

Contextual Transformer for Offline Meta Reinforcement Learning

no code implementations15 Nov 2022 Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang

Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional policy generation.

D4RL Meta Reinforcement Learning +4

TorchOpt: An Efficient Library for Differentiable Optimization

1 code implementation13 Nov 2022 Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

TorchOpt further provides a high-performance distributed execution runtime.

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

1 code implementation11 Oct 2022 Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang

A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues.

Multi-agent Reinforcement Learning reinforcement-learning +1

GenDexGrasp: Generalizable Dexterous Grasping

1 code implementation3 Oct 2022 Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang

By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands.

Diversity

MSRL: Distributed Reinforcement Learning with Dataflow Fragments

no code implementations3 Oct 2022 Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter Pietzuch, Lei Chen

Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e. g. policy network updates) on GPU workers.

reinforcement-learning Reinforcement Learning (RL)

End-to-End Affordance Learning for Robotic Manipulation

1 code implementation26 Sep 2022 Yiran Geng, Boshi An, Haoran Geng, Yuanpei Chen, Yaodong Yang, Hao Dong

Such contact prediction process then leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks.

Reinforcement Learning (RL)

Constrained Update Projection Approach to Safe Policy Optimization

3 code implementations15 Sep 2022 Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan

Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance.

Reinforcement Learning (RL) Safe Reinforcement Learning

Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

no code implementations24 Aug 2022 Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao

Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations.

Fairness Information Retrieval +2

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

no code implementations2 Aug 2022 Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community.

Multi-agent Reinforcement Learning

Scalable Model-based Policy Optimization for Decentralized Networked Systems

no code implementations13 Jul 2022 Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks.

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

1 code implementation17 Jun 2022 Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang

In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects.

Few-Shot Learning Offline RL +2

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

no code implementations30 May 2022 Oliver Slumbers, David Henry Mguni, Stephen Marcus McAleer, Stefano B. Blumberg, Jun Wang, Yaodong Yang

Although there are equilibrium concepts in game theory that take into account risk aversion, they either assume that agents are risk-neutral with respect to the uncertainty caused by the actions of other agents, or they are not guaranteed to exist.

Autonomous Driving Multi-agent Reinforcement Learning

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

1 code implementation30 May 2022 Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang

In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence.

Decision Making Multi-agent Reinforcement Learning +2

On the Convergence of Fictitious Play: A Decomposition Approach

no code implementations3 May 2022 Yurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang

Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms.

Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

no code implementations10 Mar 2022 Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao

To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space.

Data Augmentation Reinforcement Learning (RL) +1

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

no code implementations10 Feb 2022 Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang

We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem but only one single round of communication is allowed and there is a budget constraint on the total number of information (in terms of bits) that each machine can send out.

Multi-Armed Bandits Offline RL +2

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

no code implementations10 Feb 2022 Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang

Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games.

Multi-agent Reinforcement Learning reinforcement-learning +1

Efficient Policy Space Response Oracles

no code implementations28 Jan 2022 Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu, Jun Wang

Policy Space Response Oracle methods (PSRO) provide a general solution to learn Nash equilibrium in two-player zero-sum games but suffer from two drawbacks: (1) the computation inefficiency due to the need for consistent meta-game evaluation via simulations, and (2) the exploration inefficiency due to finding the best response against a fixed meta-strategy at every epoch.

Efficient Exploration

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

1 code implementation31 Dec 2021 Xidong Feng, Bo Liu, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations.

Atari Games Meta Reinforcement Learning +3

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

1 code implementation6 Dec 2021 Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu

In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.

Offline RL reinforcement-learning +4

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

1 code implementation NeurIPS 2021 Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

Diversity

Neural Auto-Curricula in Two-Player Zero-Sum Games

1 code implementation NeurIPS 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning Vocal Bursts Valence Prediction

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers

no code implementations28 Oct 2021 Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande Guo, Haifeng Zhang, Jun Wang

In this paper, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator} to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).

Traveling Salesman Problem

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

no code implementations27 Oct 2021 David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang

In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy.

OpenAI Gym reinforcement-learning +3

Measuring the Non-Transitivity in Chess

no code implementations22 Oct 2021 Ricky Sanjaya, Jun Wang, Yaodong Yang

In this paper, we quantify the non-transitivity in Chess through real-world data from human players.

Online Markov Decision Processes with Non-oblivious Strategic Adversary

no code implementations7 Oct 2021 Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang

In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $\mathcal{O}(\sqrt{T \log(L)}+\tau^2\sqrt{ T \log(|A|)})$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size of agent's action space.

Multi-Agent Constrained Policy Optimisation

4 code implementations6 Oct 2021 Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang

To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.

Multi-agent Reinforcement Learning reinforcement-learning +1

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

no code implementations20 Sep 2021 Yixin Wu, Rui Luo, Chen Zhang, Jun Wang, Yaodong Yang

In this paper, we characterize the noise of stochastic gradients and analyze the noise-induced dynamics during training deep neural networks by gradient-based optimizers.

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

no code implementations4 Sep 2021 Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang

Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Settling the Variance of Multi-Agent Policy Gradients

1 code implementation NeurIPS 2021 Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang

In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.

Reinforcement Learning (RL) Starcraft

Is Nash Equilibrium Approximator Learnable?

no code implementations17 Aug 2021 Zhijian Duan, Wenhan Huang, Dinghuai Zhang, Yali Du, Jun Wang, Yaodong Yang, Xiaotie Deng

In this paper, we investigate the learnability of the function approximator that approximates Nash equilibrium (NE) for games generated from a distribution.

BIG-bench Machine Learning Meta-Learning +1

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

1 code implementation12 Jun 2021 Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration.

Atari Games Multi-agent Reinforcement Learning +2

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

no code implementations9 Jun 2021 Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.

Diversity

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

1 code implementation5 Jun 2021 Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Atari Games Distributed Computing +4

Neural Auto-Curricula

1 code implementation4 Jun 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

no code implementations1 Jun 2021 Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.

Management Multi-agent Reinforcement Learning +3

Learning to Safely Exploit a Non-Stationary Opponent

no code implementations NeurIPS 2021 Zheng Tian, Hang Ren, Yaodong Yang, Yuchen Sun, Ziqi Han, Ian Davies, Jun Wang

On the other hand, overfitting to an opponent (i. e., exploiting only one specific type of opponent) makes the learning player easily exploitable by others.

Modelling Behavioural Diversity for Learning in Open-Ended Games

3 code implementations14 Mar 2021 Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e. g., Rock-Paper-Scissors).

Diversity Point Processes

Online Double Oracle

1 code implementation13 Mar 2021 Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.

Multi-Agent Trust Region Learning

1 code implementation1 Jan 2021 Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.

Atari Games Multi-agent Reinforcement Learning +3

Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium

no code implementations1 Jan 2021 Yizheng Hu, Kun Shao, Dong Li, Jianye Hao, Wulong Liu, Yaodong Yang, Jun Wang, Zhanxing Zhu

Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution.

Adversarial Robustness reinforcement-learning +2

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

no code implementations NeurIPS 2020 Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

1 code implementation1 Nov 2020 Yaodong Yang, Jun Wang

In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier.

Multi-agent Reinforcement Learning reinforcement-learning +1

What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator

1 code implementation NeurIPS 2021 Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.

Continuous Control Contrastive Learning +3

Learning to Infer User Hidden States for Online Sequential Advertising

no code implementations3 Sep 2020 Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Wei-Nan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important.

Multi-Agent Determinantal Q-Learning

1 code implementation ICML 2020 Yaodong Yang, Ying Wen, Li-Heng Chen, Jun Wang, Kun Shao, David Mguni, Wei-Nan Zhang

Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution.

Q-Learning

$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

no code implementations25 Sep 2019 Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou Ammar

Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of $\mathcal{O}(2^{25})$ ($\approx 33$ million joint strategies) -- a setting not evaluable using $\alpha$-Rank with reasonable computational budget.

Stochastic Optimization

Bi-level Actor-Critic for Multi-agent Coordination

1 code implementation8 Sep 2019 Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Wei-Nan Zhang, Jun Wang

Coordination is one of the essential problems in multi-agent systems.

Multiagent Systems

Spectral-based Graph Convolutional Network for Directed Graphs

no code implementations21 Jul 2019 Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, Guangyong Chen

Our approach can work directly on directed graph data in semi-supervised nodes classification tasks.

Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets

no code implementations29 May 2019 Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise.

General Classification Image Classification

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

no code implementations26 Jan 2019 Ying Wen, Yaodong Yang, Rui Luo, Jun Wang

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem.

Decision Making Multi-agent Reinforcement Learning

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

no code implementations ICLR 2019 Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan

Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge.

Multi-agent Reinforcement Learning reinforcement-learning +1

Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting

no code implementations14 Dec 2018 Yaodong Yang, Alisa Kolesnikova, Stefan Lessmann, Tiejun Ma, Ming-Chien Sung, Johnnie E. V. Johnson

The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over machine learning and rule-based benchmarks.

BIG-bench Machine Learning Feature Engineering +1

Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series

no code implementations8 Nov 2018 Qiang Zhang, Rui Luo, Yaodong Yang, Yuanyuan Liu

As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities.

Benchmarking Decision Making +2

Factorized Q-Learning for Large-Scale Multi-Agent Systems

no code implementations11 Sep 2018 Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Wei-Nan Zhang, Dell Zhang, Jun Wang, Han Liu

Deep Q-learning has achieved a significant success in single-agent decision making tasks.

Multiagent Systems

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

1 code implementation NeurIPS 2018 Rui Luo, Jianhong Wang, Yaodong Yang, Zhanxing Zhu, Jun Wang

We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions.

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

no code implementations13 Sep 2017 Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Wei-Nan Zhang, Ying Wen, Yong Yu

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models

no code implementations16 Jun 2017 Yaodong Yang, Rui Luo, Yuanyuan Liu

Mixed models with random effects account for the covariance structure related to the grouping hierarchy in the data.

Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.