The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance.
We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintained throughout the full learning chain of the ground-truth, empirical, learned and decoding-induced distributions, via the newly proposed mode recovery cost.
Understanding and creating mathematics using natural mathematical language - the mixture of symbolic and natural language used by humans - is a challenging and important problem for driving progress in machine learning.
As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem.
Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence.
Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition.
Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address.
Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core.
We investigate this problem by proposing a generalized model of sequence generation that unifies decoding in directed and undirected models.
Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right.
In this paper, we propose a novel multiset loss function by viewing this problem from the perspective of sequential decision making.