This dataset is based on the movie review polarity dataset (v2.0) collected and maintained by Bo Pang and Lillian Lee. Their dataset (we'll call it PL2.0) consists of 1000 positive and 1000 negative movie reviews obtained from the Internet Movie Database (IMDb) review archive.
The main contribution of this release is the enrichment of the documents with "annotator rationales," a concept we describe in our NAACL HLT 2007 paper.
Basically, "rationales" are segments of the text that support an annotator's classification. Let's say we have a movie review that is labeled as positive (i.e. the writer has a favorable opinion of the movie). Then the rationales would be segments of the text that support the claim (by an annotator) that the review is, indeed, positive.
Here are some examples of positive rationales (the segments enclosed by double square brackets):
And here are some examples of negative rationales:
Paper | Code | Results | Date | Stars |
---|