Sample-efficient policy learning in multi-agent Reinforcement Learning via meta-learning

ICLR 2019 · Jialian Li, Hang Su, Jun Zhu ·

To gain high rewards in muti-agent scenes, it is sometimes necessary to understand other agents and make corresponding optimal decisions. We can solve these tasks by first building models for other agents and then finding the optimal policy with these models. To get an accurate model, many observations are needed and this can be sample-inefficient. What's more, the learned model and policy can overfit to current agents and cannot generalize if the other agents are replaced by new agents. In many practical situations, each agent we face can be considered as a sample from a population with a fixed but unknown distribution. Thus we can treat the task against some specific agents as a task sampled from a task distribution. We apply meta-learning method to build models and learn policies. Therefore when new agents come, we can adapt to them efficiently. Experiments on grid games show that our method can quickly get high rewards.

PDF Abstract