Datasets > Modality > Texts > MRPC (Microsoft Research Paraphrase Corpus)

MRPC (Microsoft Research Paraphrase Corpus)

Introduced by William B. Dolan et al. in Automatically Constructing a Corpus of Sentential Paraphrases

Microsoft Research Paraphrase Corpus (MRPC) is a corpus consists of 5,801 sentence pairs collected from newswire articles. Each pair is labelled if it is a paraphrase or not by human annotators. The whole set is divided into a training subset (4,076 sentence pairs of which 2,753 are paraphrases) and a test subset (1,725 pairs of which 1,147 are paraphrases).

Source: Exploiting Semantic Annotations and Q-Learning for Constructing an Efficient Hierarchy/Graph Texts Organization

Samples

License

  • Unknown

Modalities

Languages

Tasks