Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

125,725

thudm/swissarmytransformer

2 papers

848

HHousen/TransformerSum

2 papers

425

shashiongithub/XSum

2 papers

343

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Latest papers with no code

Most implemented Social Latest No code

Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization

no code yet • 3 May 2024

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles.

Paper
Add Code

RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization

no code yet • 1 May 2024

For long document summarization, discourse structure is important to discern the key content of the text and the differences in importance level between sentences.

Paper
Add Code

Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

no code yet • 15 Apr 2024

The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models.

Paper
Add Code

Comparative Study of Domain Driven Terms Extraction Using Large Language Models

no code yet • 2 Apr 2024

Keywords play a crucial role in bridging the gap between human understanding and machine processing of textual data.

Paper
Add Code

Attribute First, then Generate: Locally-attributable Grounded Text Generation

no code yet • 25 Mar 2024

Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections.

Paper
Add Code

EROS: Entity-Driven Controlled Policy Document Summarization

no code yet • 29 Feb 2024

In this paper, we propose to enhance the interpretability and readability of policy documents by using controlled abstractive summarization -- we enforce the generated summaries to include critical privacy-related entities (e. g., data and medium) and organization's rationale (e. g., target and reason) in collecting those entities.

Paper
Add Code

NewsQs: Multi-Source Question Generation for the Inquiring Mind

no code yet • 28 Feb 2024

We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents.

Paper
Add Code

SKT5SciSumm - A Hybrid Generative Approach for Multi-Document Scientific Summarization

no code yet • 27 Feb 2024

Summarization for scientific text has shown significant benefits both for the research community and human society.

Paper
Add Code

Benchmarking LLMs on the Semantic Overlap Summarization Task

no code yet • 26 Feb 2024

While recent advancements in Large Language Models (LLMs) have achieved superior performance in numerous summarization tasks, a benchmarking study of the SOS task using LLMs is yet to be performed.

Paper
Add Code

Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

no code yet • 27 Nov 2023

This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News.

Paper
Add Code

Document Summarization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result