CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods.
PDF Abstract