Considering Nested Tree Structure in Sentence Extractive Summarization with Pre-trained Transformer

Sentence extractive summarization shortens a document by selecting sentences for a summary while preserving its important contents. However, constructing a coherent and informative summary is difficult using a pre-trained BERT-based encoder since it is not explicitly trained for representing the information of sentences in a document. We propose a nested tree-based extractive summarization model on RoBERTa (NeRoBERTa), where nested tree structures consist of syntactic and discourse trees in a given document. Experimental results on the CNN/DailyMail dataset showed that NeRoBERTa outperforms baseline models in ROUGE. Human evaluation results also showed that NeRoBERTa achieves significantly better scores than the baselines in terms of coherence and yields comparable scores to the state-of-the-art models.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Extractive Text Summarization CNN / Daily Mail NeRoBERTa ROUGE-2 20.64 # 3
ROUGE-1 43.86 # 4
ROUGE-L 40.20 # 3

Methods


No methods listed for this paper. Add relevant methods here