Transforming Wikipedia into Augmented Data for Query-Focused Summarization

8 Nov 2019  ·  Haichao Zhu, Li Dong, Furu Wei, Bing Qin, Ting Liu ·

The limited size of existing query-focused summarization datasets renders training data-driven summarization models challenging. Meanwhile, the manual construction of a query-focused summarization corpus is costly and time-consuming. In this paper, we use Wikipedia to automatically collect a large query-focused summarization dataset (named WIKIREF) of more than 280, 000 examples, which can serve as a means of data augmentation. We also develop a BERT-based query-focused summarization model (Q-BERT) to extract sentences from the documents as summaries. To better adapt a huge model containing millions of parameters to tiny benchmarks, we identify and fine-tune only a sparse subnetwork, which corresponds to a small fraction of the whole model parameters. Experimental results on three DUC benchmarks show that the model pre-trained on WIKIREF has already achieved reasonable performance. After fine-tuning on the specific benchmark datasets, the model with data augmentation outperforms strong comparison systems. Moreover, both our proposed Q-BERT model and subnetwork fine-tuning further improve the model performance. The dataset is publicly available at

PDF Abstract
No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.