End-to-end models for goal-orientated dialogue are challenging to train,
because linguistic and strategic aspects are entangled in latent state vectors.
We introduce an approach to learning representations of messages in dialogues
by maximizing the likelihood of subsequent sentences and actions, which
decouples the semantics of the dialogue utterance from its linguistic
realization. We then use these latent sentence representations for hierarchical
language generation, planning and reinforcement learning. Experiments show that
our approach increases the end-task reward achieved by the model, improves the
effectiveness of long-term planning using rollouts, and allows self-play
reinforcement learning to improve decision making without diverging from human
language. Our hierarchical latent-variable model outperforms previous work both
linguistically and strategically.