Imitation Learning for Sentence Generation with Dilated Convolutions Using Adversarial Training

In this work, we consider the sentence generation problem as an imitation learning problem, which aims to learn a policy to mimic the expert. Recent works have showed that adversarial learning can be applied to imitation learning problems. However, it has been indicated that the reward signal from the discriminator is not robust in reinforcement learning (RL) based generative adversarial network (GAN), and estimating state-action value is usually computationally intractable. To deal with this problem, we propose to use two discriminators to provide two different reward signals for constructing a more general imitation learning framework that can be used for sequence generation. Monte Carlo (MC) rollout is therefore not necessary to make our algorithm computationally tractable for generating long sequences. Furthermore, our policy and discriminator networks are integrated by sharing another state encoder network constructed based on dilated convolutions instead of recurrent neural networks (RNNs). In our experiment, we show that the two reward signals control the trade-off between the quality and the diversity of the output sequences.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here