We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator.
While normalizing flows have led to significant advances in modeling high-dimensional continuous distributions, their applicability to discrete distributions remains unknown.
Ranked #15 on Language Modelling on Text8
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.
We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework.
Deep reinforcement learning has led to several recent breakthroughs, though the learned policies are often based on black-box neural networks.
Acquiring language provides a ubiquitous mode of communication, across humans and robots.
In this paper, we introduce Key-Value Memory Networks to a multimodal setting and a novel key-addressing mechanism to deal with sequence-to-sequence models.