Search Results for author: Mingfei Sun

Found 21 papers, 9 papers with code

Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

1 code implementation23 Jun 2023 Massimiliano Patacchiola, Mingfei Sun, Katja Hofmann, Richard E. Turner

Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment.

Few-Shot Image Classification Few-Shot Imitation Learning +3

Trust-Region-Free Policy Optimization for Stochastic Policies

no code implementations15 Feb 2023 Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson

In this paper, we show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.

Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization

1 code implementation5 Feb 2023 Zichuan Lin, Xiapeng Wu, Mingfei Sun, Deheng Ye, Qiang Fu, Wei Yang, Wei Liu

Recent success in Deep Reinforcement Learning (DRL) methods has shown that policy optimization with respect to an off-policy distribution via importance sampling is effective for sample reuse.

Imitating Human Behaviour with Diffusion Models

1 code implementation25 Jan 2023 Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin

This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments.

Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning

no code implementations20 Jan 2023 Haoxuan Pan, Deheng Ye, Xiaoming Duan, Qiang Fu, Wei Yang, Jianping He, Mingfei Sun

We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization.

Continuous Control reinforcement-learning +1

UniMASK: Unified Inference in Sequential Decision Problems

1 code implementation20 Nov 2022 Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.

Decision Making

Trust Region Bounds for Decentralized PPO Under Non-stationarity

no code implementations31 Jan 2022 Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson

We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary.

Multi-agent Reinforcement Learning

Generalization in Cooperative Multi-Agent Systems

no code implementations31 Jan 2022 Anuj Mahajan, Mikayel Samvelyan, Tarun Gupta, Benjamin Ellis, Mingfei Sun, Tim Rocktäschel, Shimon Whiteson

Specifically, we study generalization bounds under a linear dependence of the underlying dynamics on the agent capabilities, which can be seen as a generalization of Successor Features to MAS.

Generalization Bounds Multi-agent Reinforcement Learning

You May Not Need Ratio Clipping in PPO

no code implementations31 Jan 2022 Mingfei Sun, Vitaly Kurin, Guoqing Liu, Sam Devlin, Tao Qin, Katja Hofmann, Shimon Whiteson

Furthermore, we show that ESPO can be easily scaled up to distributed training with many workers, delivering strong performance as well.

Continuous Control

Birds Eye View Social Distancing Analysis System

no code implementations14 Dec 2021 Zhengye Yang, Mingfei Sun, Hongzhe Ye, Zihao Xiong, Gil Zussman, Zoran Kostic

We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections.

object-detection Object Detection +2

Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings

no code implementations25 Nov 2020 Deheng Ye, Guibin Chen, Peilin Zhao, Fuhao Qiu, Bo Yuan, Wen Zhang, Sheng Chen, Mingfei Sun, Xiaoqian Li, Siqin Li, Jing Liang, Zhenjie Lian, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang

Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner.

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

6 code implementations18 Nov 2020 Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function.

reinforcement-learning Reinforcement Learning (RL) +2

Adversarial Imitation Learning from Incomplete Demonstrations

1 code implementation29 May 2019 Mingfei Sun, Xiaojuan Ma

In this paper, we propose a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a policy from demonstrations with incomplete action sequences, i. e., incomplete demonstrations.

Imitation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.