Multi-Document Summarization
110 papers with code • 5 benchmarks • 15 datasets
Multi-Document Summarization is a process of representing a set of documents with a short piece of text by capturing the relevant information and filtering out the redundant information. Two prominent approaches to Multi-Document Summarization are extractive and abstractive summarization. Extractive summarization systems aim to extract salient snippets, sentences or passages from documents, while abstractive summarization systems aim to concisely paraphrase the content of the documents.
Source: Multi-Document Summarization using Distributed Bag-of-Words Model
Datasets
Most implemented papers
Bottom-Up Abstractive Summarization
We use this selector as a bottom-up attention step to constrain the model to likely phrases.
Generating Wikipedia by Summarizing Long Sequences
We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents.
Scoring Sentence Singletons and Pairs for Abstractive Summarization
There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs.
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models.
Centroid-based Text Summarization through Compositionality of Word Embeddings
The textual similarity is a crucial aspect for many extractive text summarization methods.
Leveraging Graph to Improve Abstractive Multi-Document Summarization
Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries.
Pre-training via Paraphrasing
The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
We enlist medical professionals to evaluate generated summaries, and we find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual.
Global-aware Beam Search for Neural Abstractive Summarization
A global scoring mechanism is then developed to regulate beam search to generate summaries in a near-global optimal fashion.