PeerSum is a new MDS dataset using peer reviews of scientific publications. The dataset differs from the existing MDS datasets in that summaries (i.e., the meta-reviews) are highly abstractive and they are real summaries of the source documents.

In PeerSum, we have reviews (with scores), comments and responses as the source documents and the meta-review (with an acceptance outcome) as the ground truth summary. Each sample of this dataset contains a summary, corresponding source documents and also other complementary information (e.g., review scores) for one paper. The second version of PeerSum (peersum_v2) has 16,308 samples, while there are 10,862 samples in the first version.

The dataset is stored in the json format. For each sample, details are based on following keys with explanation:

  • paper_id: unique id for each sample
  • title: the title of the corresponding paper
  • abstract: paper abstract
  • score: final score of this paper (if there is not a final, it will be an average of review scores)
  • acceptance: acceptance of the paper (e.g., accept, reject or spotlight)
  • meta_review: meta-review of the paper and this is treated as the summary
  • reviews: [review_id, writer, content (rating, confidence, comment), replyto] review_id and replyto are for the conversation structure
  • label: train, val, test (8/1/1)

For each review (i.e., official review, public comment, or author/reviewer response): * review_id: unique id of each review * writer: official_reviewer, public, author * content: (rating, confidence, comment) * replyto: connect to a review (review_id and replyto are for the conversation structure)


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown