Search Results for author: Weixun Wang

Found 19 papers, 6 papers with code

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation • 24 Mar 2024 • Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

Paper
Code

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

1 code implementation • 11 Oct 2022 • Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang

A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues.

Multi-agent Reinforcement Learning reinforcement-learning +1

770

Paper
Code

Off-Beat Multi-Agent Reinforcement Learning

no code implementations • 27 May 2022 • Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana Obraztsova, Zinovi Rabinovich, Jianye Hao, Yingfeng Chen, Changjie Fan

During execution durations, the environment changes are influenced by, but not synchronised with, action execution.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Add Code

A2C is a special case of PPO

1 code implementation • 18 May 2022 • Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

1 code implementation • 16 Mar 2022 • Jian Zhao, Youpeng Zhao, Weixun Wang, Mingyu Yang, Xunhan Hu, Wengang Zhou, Jianye Hao, Houqiang Li

To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

no code implementations • 10 Mar 2022 • Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao

To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space.

Data Augmentation Reinforcement Learning (RL) +1

Paper
Add Code

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

no code implementations • 9 Feb 2022 • Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye Hao, Jiangcheng Zhu, Houqiang Li

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.

Paper
Add Code

Learning Explicit Credit Assignment for Multi-agent Joint Q-learning

no code implementations • 29 Sep 2021 • Hangyu Mao, Jianye Hao, Dong Li, Jun Wang, Weixun Wang, Xiaotian Hao, Bin Wang, Kun Shao, Zhen Xiao, Wulong Liu

In contrast, we formulate an \emph{explicit} credit assignment problem where each agent gives its suggestion about how to weight individual Q-values to explicitly maximize the joint Q-value, besides guaranteeing the Bellman optimality of the joint Q-value.

Q-Learning

Paper
Add Code

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

no code implementations • 1 Jun 2021 • Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.

Management Multi-agent Reinforcement Learning +3

Paper
Add Code

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

no code implementations • NeurIPS 2020 • Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

In this paper, we consider the problem of adaptively utilizing a given shaping reward function.

Reinforcement Learning (RL)

Paper
Add Code

Transfer among Agents: An Efficient Multiagent Transfer Learning Framework

no code implementations • 28 Sep 2020 • Tianpei Yang, Jianye Hao, Weixun Wang, Hongyao Tang, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yujing Hu, Yingfeng Chen, Changjie Fan

In many cases, each agent's experience is inconsistent with each other which causes the option-value estimation to oscillate and to become inaccurate.

Open-Ended Question Answering Reinforcement Learning (RL) +1

Paper
Add Code

Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising

no code implementations • 9 May 2020 • Xiaotian Hao, Junqi Jin, Jianye Hao, Jin Li, Weixun Wang, Yi Ma, Zhenzhe Zheng, Han Li, Jian Xu, Kun Gai

Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.

Paper
Add Code

Efficient Deep Reinforcement Learning via Adaptive Policy Transfer

1 code implementation • 19 Feb 2020 • Tianpei Yang, Jianye Hao, Zhaopeng Meng, Zongzhang Zhang, Yujing Hu, Yingfeng Cheng, Changjie Fan, Weixun Wang, Wulong Liu, Zhaodong Wang, Jiajie Peng

Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

no code implementations • 18 Feb 2020 • Peng Zhang, Jianye Hao, Weixun Wang, Hongyao Tang, Yi Ma, Yihai Duan, Yan Zheng

Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge.

Common Sense Reasoning Continuous Control +2

Paper
Add Code

Multi-Agent Game Abstraction via Graph Attention Neural Network

no code implementations • 25 Nov 2019 • Yong Liu, Weixun Wang, Yujing Hu, Jianye Hao, Xingguo Chen, Yang Gao

Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents.

Graph Attention Multi-agent Reinforcement Learning

Paper
Add Code

From Few to More: Large-scale Dynamic Multiagent Curriculum Learning

no code implementations • 6 Sep 2019 • Weixun Wang, Tianpei Yang, Yong liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao

In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents.

Paper
Add Code

Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

1 code implementation • ICLR 2020 • Weixun Wang, Tianpei Yang, Yong liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao

ASN characterizes different actions' influence on other agents using neural networks based on the action semantics between them.

Starcraft Starcraft II

Paper
Code

Learning Adaptive Display Exposure for Real-Time Advertising

no code implementations • 10 Sep 2018 • Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Wei-Nan Zhang, Jun Wang, Xiaotian Hao, Yixi Wang, Han Li, Jian Xu, Kun Gai

In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased?

Paper
Add Code

Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach

no code implementations • 1 Mar 2018 • Weixun Wang, Jianye Hao, Yixi Wang, Matthew Taylor

We introduce a Sequential Prisoner's Dilemma (SPD) game to better capture the aforementioned characteristics.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.