Challenges in Data-to-Document Generation

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

PDF Abstract EMNLP 2017 PDF EMNLP 2017 Abstract

Datasets


Introduced in the Paper:

RotoWire

Used in the Paper:

RoboCup
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Data-to-Text Generation RotoWire Encoder-decoder + conditional copy BLEU 14.19 # 6
Data-to-Text Generation RotoWire (Content Ordering) Encoder-decoder + conditional copy DLD 8.68% # 5
BLEU 14.49 # 4
Data-to-Text Generation Rotowire (Content Selection) Encoder-decoder + conditional copy Precision 29.49% # 5
Recall 36.18% # 5
Data-to-Text Generation RotoWire (Relation Generation) Encoder-decoder + conditional copy count 23.72 # 5
Precision 74.80% # 6

Methods


No methods listed for this paper. Add relevant methods here