Alleviating Exposure Bias via Multi-level Contrastive Learning and Deviation Simulation in Abstractive Summarization

Most Transformer based abstractive summarization systems have a severe mismatch between training and inference, i.e., exposure bias. From diverse perspectives, we introduce a simple multi-level contrastive learning framework for abstractive summarization (SimMCS) and a tailored sparse decoder self-attention pattern (SDSA) to bridge the gap between training and inference to improve model performance. Compared with previous contrastive objectives focusing only on the relative order of probability mass assigned to non-gold summaries, SimMCS additionally takes their absolute positions into account, which guarantees that the relatively high-quality (positive) summaries among them could be properly assigned high probability mass, and further enhances the capability of discriminating summary quality beyond exploiting potential artifacts of specific metrics. SDSA simulates the possible inference scenarios of deviation in the training phase to get closer to the ideal paradigm. Our approaches outperform the previous state-of-the-art results on two summarization datasets while just adding fairly low overhead. Further empirical analysis shows our model preserves the advantages of prior contrastive methods and possesses strong few-shot learning ability.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods