TNT-NLG, System 1: Using a statistical NLG to massively augment crowd-sourced data for neural generation

Ever since the successful application of sequence to sequence learning for neural machine translation systems (Sutskever et al., 2014), interest has surged in its applicability towards language generation in other problem domains. In the area of natural language generation (NLG), there has been a great deal of interest in end-to-end (E2E) neural models that learn and generate natural language sentence realizations in one step. In this paper, we present TNT-NLG System 1, our first system submission to the E2E NLG Challenge, where we generate natural language (NL) realizations from meaning representations (MRs) in the restaurant domain by massively expanding the training dataset. We develop two models for this system, based on Dusek et al.’s (2016a) open source baseline model and context-aware neural language generator. Starting with the MR and NL pairs from the E2E generation challenge dataset, we explode the size of the training set using PERSONAGE (Mairesse and Walker, 2010), a statistical generator able to produce varied realizations from MRs, and use our expanded data as contextual input into our models. We present evaluation results using automated and human evaluation metrics, and describe directions for future work.

PDF

Datasets


Results from the Paper


Ranked #7 on Data-to-Text Generation on E2E NLG Challenge (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Data-to-Text Generation E2E NLG Challenge Sys1-Primary BLEU 65.61 # 7
NIST 8.5105 # 6
METEOR 45.17 # 4
ROUGE-L 68.39 # 7
CIDEr 2.2183 # 5

Methods


No methods listed for this paper. Add relevant methods here