no code implementations • 23 May 2023 • Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
no code implementations • 7 May 2023 • Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu
Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment.
1 code implementation • 3 Mar 2023 • Xu-Hui Liu, Feng Xu, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Ruifeng Chen, Zongzhang Zhang, Yang Yu
In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process.
no code implementations • 19 Feb 2023 • Cong Guan, Feng Chen, Lei Yuan, Zongzhang Zhang, Yang Yu
We also release the built offline benchmarks in this paper as a testbed for communication ability validation to facilitate further future research.
no code implementations • 31 Jan 2023 • Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu
Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes.
Ranked #1 on Multi-step retrosynthesis on USPTO-190
1 code implementation • 13 Oct 2022 • Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, Yang Yu
MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.
no code implementations • 31 Aug 2022 • Chao Chen, Dawei Wang, Feng Mao, Zongzhang Zhang, Yang Yu
Semi-supervised Anomaly Detection (AD) is a kind of data mining task which aims at learning features from partially-labeled datasets to help detect outliers.
no code implementations • 1 Jun 2022 • Chengxing Jia, Hao Yin, Chenxiao Gao, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu
Model-based offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization, where the learned policy could adapt to different dynamics enumerated at the training stage.
no code implementations • 9 Mar 2022 • Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Zongzhang Zhang, Chongjie Zhang, Yang Yu
We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks.
1 code implementation • NeurIPS 2021 • Chenyang Wu, Guoyu Yang, Zongzhang Zhang, Yang Yu, Dong Li, Wulong Liu, Jianye Hao
A belief is a distribution of states representing state uncertainty.
1 code implementation • NeurIPS 2021 • Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu
Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
no code implementations • 19 May 2020 • Cong Fei, Bin Wang, Yuzheng Zhuang, Zongzhang Zhang, Jianye Hao, Hongbo Zhang, Xuewu Ji, Wulong Liu
Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning.
no code implementations • 19 Feb 2020 • Tianpei Yang, Jianye Hao, Zhaopeng Meng, Zongzhang Zhang, Yujing Hu, Yingfeng Cheng, Changjie Fan, Weixun Wang, Wulong Liu, Zhaodong Wang, Jiajie Peng
Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks.
no code implementations • 23 Dec 2019 • Xiaobai Ma, Katherine Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points.
no code implementations • NeurIPS 2018 • Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan
In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly.
no code implementations • 25 Sep 2018 • Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Zhaopeng Meng, Changjie Fan, Li Wang
Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning.
no code implementations • 23 Feb 2018 • Yan Zheng, Jianye Hao, Zongzhang Zhang
Recently, multiagent deep reinforcement learning (DRL) has received increasingly wide attention.