Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Libraries

Use these libraries to find Document Summarization models and implementations

Latest papers with no code

Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization

no code yet • 3 May 2024

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles.

RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization

no code yet • 1 May 2024

For long document summarization, discourse structure is important to discern the key content of the text and the differences in importance level between sentences.

Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

no code yet • 15 Apr 2024

The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models.

Comparative Study of Domain Driven Terms Extraction Using Large Language Models

no code yet • 2 Apr 2024

Keywords play a crucial role in bridging the gap between human understanding and machine processing of textual data.

Attribute First, then Generate: Locally-attributable Grounded Text Generation

no code yet • 25 Mar 2024

Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections.

EROS: Entity-Driven Controlled Policy Document Summarization

no code yet • 29 Feb 2024

In this paper, we propose to enhance the interpretability and readability of policy documents by using controlled abstractive summarization -- we enforce the generated summaries to include critical privacy-related entities (e. g., data and medium) and organization's rationale (e. g., target and reason) in collecting those entities.

NewsQs: Multi-Source Question Generation for the Inquiring Mind

no code yet • 28 Feb 2024

We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents.

SKT5SciSumm - A Hybrid Generative Approach for Multi-Document Scientific Summarization

no code yet • 27 Feb 2024

Summarization for scientific text has shown significant benefits both for the research community and human society.

Benchmarking LLMs on the Semantic Overlap Summarization Task

no code yet • 26 Feb 2024

While recent advancements in Large Language Models (LLMs) have achieved superior performance in numerous summarization tasks, a benchmarking study of the SOS task using LLMs is yet to be performed.

Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

no code yet • 27 Nov 2023

This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News.