Fine-tune BERT for Extractive Summarization

arXiv 2019  ·  Yang Liu ·

BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at

PDF Abstract


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Extractive Document Summarization CNN / Daily Mail BERTSUM ROUGE-1 43.25 # 2
ROUGE-2 20.24 # 2
ROUGE-L 39.63 # 2
Document Summarization CNN / Daily Mail BERTSUM+Transformer ROUGE-1 43.25 # 12
ROUGE-2 20.24 # 12
ROUGE-L 39.63 # 13