Data-to-Text Generation

105 papers with code • 24 benchmarks • 22 datasets

A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.

( Image credit: Data-to-Text Generation with Content Selection and Planning )

Libraries

Use these libraries to find Data-to-Text Generation models and implementations
2 papers
204

Most implemented papers

Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention

budzianowski/multiwoz ACL 2019

Semantically controlled neural response generation on limited-domain has achieved great performance.

Data-to-text Generation with Entity Modeling

ratishsp/data2text-entity-py ACL 2019

Recent approaches to data-to-text generation have shown great promise thanks to the use of large-scale datasets and the application of neural network architectures which are trained end-to-end.

Learning to Select, Track, and Generate for Data-to-Text

aistairc/sports-reporter ACL 2019

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation.

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model

ZhihongShao/Planning-based-Hierarchical-Variational-Model IJCNLP 2019

Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions.

Few-shot Natural Language Generation for Task-Oriented Dialog

pengbaolin/SC-GPT Findings of the Association for Computational Linguistics 2020

It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains.

Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation

ThomasScialom/QuestEval EMNLP 2021

QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions.

Plan-then-Generate: Controlled Data-to-Text Generation via Planning

google-research-datasets/ToTTo Findings (EMNLP) 2021

However, the lack of ability of neural models to control the structure of generated output can be limiting in certain real-world applications.

Control Prefixes for Parameter-Efficient Text Generation

Yale-LILY/dart 15 Oct 2021

Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application.

Chart-to-Text: A Large-Scale Benchmark for Chart Summarization

JasonObeid/Chart2Text ACL 2022

We also introduce a number of state-of-the-art neural models as baselines that utilize image captioning and data-to-text generation techniques to tackle two problem variations: one assumes the underlying data table of the chart is available while the other needs to extract data from chart images.