Data-to-Text Generation

105 papers with code • 24 benchmarks • 22 datasets

A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.

( Image credit: Data-to-Text Generation with Content Selection and Planning )

Benchmarks

Add a Result

These leaderboards are used to track progress in Data-to-Text Generation

Dataset	Best Model	Compare
WebNLG	Control Prefixes (A1, T5-large)	See all
E2E NLG Challenge	S_1^R	See all
WebNLG Full	Control Prefixes (A1, A2, T5-large)	See all
Cleaned E2E NLG Challenge	Control Prefixes (T5-large)	See all
RotoWire (Relation Generation)	SeqPlan	See all
RotoWire	HierarchicalEncoder + NR + IR	See all
ToTTo	T5-3B	See all
XAlign	Fact-aware embedding with mT5	See all
Rotowire (Content Selection)	Hierarchical Transformer Encoder + conditional copy	See all
RotoWire (Content Ordering)	Hierarchical Transformer Encoder + conditional copy	See all
MULTIWOZ 2.1	T5-Base	See all
MLB Dataset (Relation Generation)	SeqPlan	See all
MLB Dataset	SeqPlan	See all
MLB Dataset (Content Ordering)	SeqPlan	See all
Czech Restaurant NLG	binmt	See all
MLB Dataset (Content Selection)	Force-Copy	See all
SR11Deep	Transition based Deep Input Linearization	See all
ViGGO	DataTuner_FC	See all
WebNLG en	mBART	See all
WebNLG ru	mBART	See all
E2E	self-mem + new data (random)	See all
AMR3.0	StructAdapt	See all
Wikipedia Person and Animal Dataset	Ours	See all
DART	self-mem + new data	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Data-to-Text Generation models and implementations

UFAL-DSG/tgen

2 papers

204

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention

budzianowski/multiwoz • • ACL 2019

Semantically controlled neural response generation on limited-domain has achieved great performance.

Paper
Code

Data-to-text Generation with Entity Modeling

ratishsp/data2text-entity-py • • ACL 2019

Recent approaches to data-to-text generation have shown great promise thanks to the use of large-scale datasets and the application of neural network architectures which are trained end-to-end.

Paper
Code

Learning to Select, Track, and Generate for Data-to-Text

aistairc/sports-reporter • • ACL 2019

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation.

Paper
Code

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model

ZhihongShao/Planning-based-Hierarchical-Variational-Model • • IJCNLP 2019

Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions.

Paper
Code

Few-shot Natural Language Generation for Task-Oriented Dialog

pengbaolin/SC-GPT • • Findings of the Association for Computational Linguistics 2020

It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains.

Paper
Code

Text-to-Text Pre-Training for Data-to-Text Tasks

google-research-datasets/ToTTo • INLG (ACL) 2020

We study the pre-train + fine-tune strategy for data-to-text tasks.

Paper
Code

Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation

ThomasScialom/QuestEval • • EMNLP 2021

QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions.

Paper
Code

Plan-then-Generate: Controlled Data-to-Text Generation via Planning

google-research-datasets/ToTTo • Findings (EMNLP) 2021

However, the lack of ability of neural models to control the structure of generated output can be limiting in certain real-world applications.

Paper
Code

Control Prefixes for Parameter-Efficient Text Generation

Yale-LILY/dart • 15 Oct 2021

Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application.

Paper
Code

Chart-to-Text: A Large-Scale Benchmark for Chart Summarization

JasonObeid/Chart2Text • • ACL 2022

We also introduce a number of state-of-the-art neural models as baselines that utilize image captioning and data-to-text generation techniques to tackle two problem variations: one assumes the underlying data table of the chart is available while the other needs to extract data from chart images.

Paper
Code

Data-to-Text Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result