Search Results for author: Chongjie Zhang

Found 45 papers, 16 papers with code

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

no code implementations16 Mar 2022 Xi Chen, Ali Ghadirzadeh, Tianhe Yu, Yuan Gao, Jianhao Wang, Wenzhe Li, Bin Liang, Chelsea Finn, Chongjie Zhang

Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.

Continuous Control Offline RL +1

Multi-Agent Policy Transfer via Task Relationship Modeling

no code implementations9 Mar 2022 Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Zongzhang Zhang, Chongjie Zhang, Yang Yu

We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks.

Transfer Learning

Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL

1 code implementation ICLR 2022 Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang

In this paper, we revisit the theoretical property of GCSL -- optimizing a lower bound of the goal reaching objective, and extend GCSL as a novel offline goal-conditioned RL algorithm.

Offline RL Self-Supervised Learning

MOORe: Model-based Offline-to-Online Reinforcement Learning

no code implementations25 Jan 2022 Yihuan Mao, Chao Wang, Bin Wang, Chongjie Zhang

With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online.

reinforcement-learning

Self-Organized Polynomial-Time Coordination Graphs

no code implementations7 Dec 2021 Qianlan Yang, Weijun Dong, Zhizhou Ren, Jianhao Wang, Tonghan Wang, Chongjie Zhang

However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values.

Multi-agent Reinforcement Learning

Offline Reinforcement Learning with Value-based Episodic Memory

no code implementations ICLR 2022 Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data.

Offline RL reinforcement-learning

Offline Reinforcement Learning with Reverse Model-based Imagination

no code implementations NeurIPS 2021 Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang

These reverse imaginations provide informed data augmentation for model-free policy learning and enable conservative generalization beyond the offline dataset.

Data Augmentation Offline RL +1

Learning Homophilic Incentives in Sequential Social Dilemmas

no code implementations29 Sep 2021 Heng Dong, Tonghan Wang, Jiayuan Liu, Chi Han, Chongjie Zhang

Promoting cooperation among self-interested agents is a long-standing and interdisciplinary problem, but receives less attention in multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning

Safe Opponent-Exploitation Subgame Refinement

no code implementations29 Sep 2021 Mingyang Liu, Chengjie WU, Qihan Liu, Yansen Jing, Jun Yang, Pingzhong Tang, Chongjie Zhang

Search algorithms have been playing a vital role in the success of superhuman AI in both perfect information and imperfect information games.

Learning the Representation of Behavior Styles with Imitation Learning

no code implementations29 Sep 2021 Xiao Liu, Meng Wang, Zhaorong Wang, Yingfeng Chen, Yujing Hu, Changjie Fan, Chongjie Zhang

Imitation learning is one of the methods for reproducing expert demonstrations adaptively by learning a mapping between observations and actions.

Imitation Learning

On the Estimation Bias in Double Q-Learning

1 code implementation NeurIPS 2021 Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation.

Q-Learning Value prediction

Context-Aware Sparse Deep Coordination Graphs

no code implementations ICLR 2022 Tonghan Wang, Liang Zeng, Weijun Dong, Qianlan Yang, Yang Yu, Chongjie Zhang

We carry out a case study and experiments on the MACO and StarCraft II micromanagement benchmark to demonstrate the dynamics of sparse graph learning, the influence of graph sparseness, and the learning performance of our method.

graph construction Graph Learning +2

Active Hierarchical Exploration with Stable Subgoal Representation Learning

1 code implementation ICLR 2022 Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang

Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level policy learning relies on external rewards.

Continuous Control Hierarchical Reinforcement Learning +1

Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

no code implementations23 Apr 2021 Heng Dong, Tonghan Wang, Jiayuan Liu, Chi Han, Chongjie Zhang

We propose a novel learning framework to encourage homophilic incentives and show that it achieves stable cooperation in both SSDs of public goods and tragedy of the commons.

Multi-agent Reinforcement Learning

Generalizable Episodic Memory for Deep Reinforcement Learning

1 code implementation11 Mar 2021 Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, Chongjie Zhang

Episodic memory-based methods can rapidly latch onto past successful strategies by a non-parametric memory and improve sample efficiency of traditional reinforcement learning.

Atari Games Continuous Control +1

Learning Subgoal Representations with Slow Dynamics

no code implementations ICLR 2021 Siyuan Li, Lulu Zheng, Jianhao Wang, Chongjie Zhang

In goal-conditioned Hierarchical Reinforcement Learning (HRL), a high-level policy periodically sets subgoals for a low-level policy, and the low-level policy is trained to reach those subgoals.

Continuous Control Hierarchical Reinforcement Learning +1

DOP: Off-Policy Multi-Agent Decomposed Policy Gradients

no code implementations ICLR 2021 Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP).

Multi-agent Reinforcement Learning Starcraft +1

Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

1 code implementation NeurIPS 2020 Guangxiang Zhu, Minghao Zhang, Honglak Lee, Chongjie Zhang

It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.

Model-based Reinforcement Learning reinforcement-learning

RODE: Learning Roles to Decompose Multi-Agent Tasks

2 code implementations ICLR 2021 Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang

Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.

14 Starcraft +1

Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning

no code implementations28 Sep 2020 Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang

Value decomposition is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings.

Multi-agent Reinforcement Learning Q-Learning +2

QPLEX: Duplex Dueling Multi-Agent Q-Learning

3 code implementations ICLR 2021 Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang

This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function.

Decision Making Multi-agent Reinforcement Learning +3

Off-Policy Multi-Agent Decomposed Policy Gradients

1 code implementation24 Jul 2020 Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP).

Multi-agent Reinforcement Learning Starcraft +1

SOAC: The Soft Option Actor-Critic Architecture

no code implementations25 Jun 2020 Chenghao Li, Xiaoteng Ma, Chongjie Zhang, Jun Yang, Li Xia, Qianchuan Zhao

In these tasks, our approach learns a diverse set of options, each of whose state-action space has strong coherence.

Transfer Learning

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization

no code implementations NeurIPS 2021 Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang

Value factorization is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings, which balances the learning scalability and the representational capacity of value functions.

Multi-agent Reinforcement Learning Q-Learning +2

ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

1 code implementation ICML 2020 Tonghan Wang, Heng Dong, Victor Lesser, Chongjie Zhang

In this paper, we synergize these two paradigms and propose a role-oriented MARL framework (ROMA).

Multiagent Systems

Influence-Based Multi-Agent Exploration

1 code implementation ICLR 2020 Tonghan Wang, Jianhao Wang, Yi Wu, Chongjie Zhang

We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents.

reinforcement-learning

Learning Nearly Decomposable Value Functions Via Communication Minimization

1 code implementation ICLR 2020 Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang

Recently, value function factorization learning emerges as a promising way to address these challenges in collaborative multi-agent systems.

Starcraft

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

1 code implementation NeurIPS 2019 Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang

In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy.

Hierarchical Reinforcement Learning reinforcement-learning

Object-Oriented Model Learning through Multi-Level Abstraction

no code implementations ICLR 2019 Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Chongjie Zhang

Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability.

Relational Reasoning Self-Supervised Learning

Object-Oriented Dynamics Learning through Multi-Level Abstraction

1 code implementation16 Apr 2019 Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Zichuan Lin, Chongjie Zhang

We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability.

Relational Reasoning Self-Supervised Learning

Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games

no code implementations7 Mar 2019 Xinliang Song, Tonghan Wang, Chongjie Zhang

Learning in a multi-agent system is challenging because agents are simultaneously learning and the environment is not stationary, undermining convergence guarantees.

Towards Efficient Detection and Optimal Response against Sophisticated Opponents

no code implementations12 Sep 2018 Tianpei Yang, Zhaopeng Meng, Jianye Hao, Chongjie Zhang, Yan Zheng, Ze Zheng

This paper proposes a novel approach called Bayes-ToMoP which can efficiently detect the strategy of opponents using either stationary or higher-level reasoning strategies.

Multiagent Systems

Context-Aware Policy Reuse

no code implementations11 Jun 2018 Siyuan Li, Fangda Gu, Guangxiang Zhu, Chongjie Zhang

Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks.

reinforcement-learning Transfer Learning

Object-Oriented Dynamics Predictor

1 code implementation NeurIPS 2018 Guangxiang Zhu, Zhiao Huang, Chongjie Zhang

Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning.

Model-based Reinforcement Learning

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

no code implementations24 Sep 2017 Siyuan Li, Chongjie Zhang

In this paper, we develop an optimal online method to select source policies for reinforcement learning.

Q-Learning reinforcement-learning +2

Fairness in Multi-Agent Sequential Decision-Making

no code implementations NeurIPS 2014 Chongjie Zhang, Julie A. Shah

We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy.

Decision Making Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.