SIMILE: Introducing Sequential Information towards More Effective Imitation Learning

ICLR 2019  ·  Yutong Bai, Lingxi Xie ·

Reinforcement learning (RL) is a metaheuristic aiming at teaching an agent to interact with an environment and maximizing the reward in a complex task. RL algorithms often encounter the difficulty in defining a reward function in a sparse solution space. Imitation learning (IL) deals with this issue by providing a few expert demonstrations, and then either mimicking the expert's behavior (behavioral cloning, BC) or recovering the reward function by assuming the optimality of the expert (inverse reinforcement learning, IRL). Conventional IL approaches formulate the agent policy by mapping one single state to a distribution over actions, which did not consider sequential information. This strategy can be less accurate especially in IL, a weakly supervised learning environment, especially when the number of expert demonstrations is limited. This paper presents an effective approach named Sequential IMItation LEarning (SIMILE). The core idea is to introduce sequential information, so that an agent can refer to both the current state and past state-action pairs to make a decision. We formulate our approach into a recurrent model, and instantiate it using LSTM so as to fuse both long-term and short-term information. SIMILE is a generalized IL framework which is easily applied to BL and IRL, two major types of IL algorithms. Experiments are performed on several robot controlling tasks in OpenAI Gym. SIMILE not only achieves performance gain over the baseline approaches, but also enjoys the benefit of faster convergence and better stability of testing performance. These advantages verify a higher learning efficiency of SIMILE, and implies its potential applications in real-world scenarios, i.e., when the agent-environment interaction is more difficult and/or expensive.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods