MRPC (Microsoft Research Paraphrase Corpus)

Introduced by William B. Dolan et al. in Automatically Constructing a Corpus of Sentential Paraphrases

Microsoft Research Paraphrase Corpus (MRPC) is a corpus consists of 5,801 sentence pairs collected from newswire articles. Each pair is labelled if it is a paraphrase or not by human annotators. The whole set is divided into a training subset (4,076 sentence pairs of which 2,753 are paraphrases) and a test subset (1,725 pairs of which 1,147 are paraphrases).

Source: Exploiting Semantic Annotations and Q-Learning for Constructing an Efficient Hierarchy/Graph Texts Organization


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown