no code implementations • ACL (MetaNLP) 2021 • Zhenjie Zhao, Mingfei Sun, Xiaojuan Ma
In this paper, we propose a meta reinforcement learning based method to train text agents through learning-to-explore.
no code implementations • 10 Mar 2024 • Hanfang Lyu, Yuanchen Bai, Xin Liang, Ujaan Das, Chuhan Shi, Leiliang Gong, Yingchi Li, Mingfei Sun, Ming Ge, Xiaojuan Ma
Preference-based learning aims to align robot task objectives with human values.
1 code implementation • 4 Mar 2024 • Maytus Piriyajitakonkij, Mingfei Sun, Mengmi Zhang, Wei Pan
Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model.
1 code implementation • 23 Jun 2023 • Massimiliano Patacchiola, Mingfei Sun, Katja Hofmann, Richard E. Turner
Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment.
Few-Shot Image Classification Few-Shot Imitation Learning +3
no code implementations • 15 Feb 2023 • Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson
In this paper, we show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.
1 code implementation • 5 Feb 2023 • Zichuan Lin, Xiapeng Wu, Mingfei Sun, Deheng Ye, Qiang Fu, Wei Yang, Wei Liu
Recent success in Deep Reinforcement Learning (DRL) methods has shown that policy optimization with respect to an off-policy distribution via importance sampling is effective for sample reuse.
1 code implementation • 25 Jan 2023 • Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin
This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments.
no code implementations • 20 Jan 2023 • Haoxuan Pan, Deheng Ye, Xiaoming Duan, Qiang Fu, Wei Yang, Jianping He, Mingfei Sun
We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization.
1 code implementation • NeurIPS 2023 • Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson
In this work, we conduct new analysis demonstrating that SMAC lacks the stochasticity and partial observability to require complex *closed-loop* policies.
1 code implementation • 20 Nov 2022 • Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin
Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.
no code implementations • 28 Apr 2022 • Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin
Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.
no code implementations • 31 Jan 2022 • Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson
We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary.
no code implementations • 31 Jan 2022 • Anuj Mahajan, Mikayel Samvelyan, Tarun Gupta, Benjamin Ellis, Mingfei Sun, Tim Rocktäschel, Shimon Whiteson
Specifically, we study generalization bounds under a linear dependence of the underlying dynamics on the agent capabilities, which can be seen as a generalization of Successor Features to MAS.
no code implementations • 31 Jan 2022 • Mingfei Sun, Vitaly Kurin, Guoqing Liu, Sam Devlin, Tao Qin, Katja Hofmann, Shimon Whiteson
Furthermore, we show that ESPO can be easily scaled up to distributed training with many workers, delivering strong performance as well.
no code implementations • 14 Dec 2021 • Zhengye Yang, Mingfei Sun, Hongzhe Ye, Zihao Xiong, Gil Zussman, Zoran Kostic
We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections.
1 code implementation • 11 Dec 2021 • Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications.
no code implementations • 6 Jun 2021 • Mingfei Sun, Anuj Mahajan, Katja Hofmann, Shimon Whiteson
We present SoftDICE, which achieves state-of-the-art performance for imitation learning.
no code implementations • 25 Nov 2020 • Deheng Ye, Guibin Chen, Peilin Zhao, Fuhao Qiu, Bo Yuan, Wen Zhang, Sheng Chen, Mingfei Sun, Xiaoqian Li, Siqin Li, Jing Liang, Zhenjie Lian, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang
Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner.
6 code implementations • 18 Nov 2020 • Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function.
no code implementations • 20 Dec 2019 • Deheng Ye, Zhao Liu, Mingfei Sun, Bei Shi, Peilin Zhao, Hao Wu, Hongsheng Yu, Shaojie Yang, Xipeng Wu, Qingwei Guo, Qiaobo Chen, Yinyuting Yin, Hao Zhang, Tengfei Shi, Liang Wang, Qiang Fu, Wei Yang, Lanxiao Huang
We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games.
1 code implementation • 29 May 2019 • Mingfei Sun, Xiaojuan Ma
In this paper, we propose a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a policy from demonstrations with incomplete action sequences, i. e., incomplete demonstrations.