Movie Reviews (Movie Review Polarity Dataset Enriched with "Annotator Rationales")

This dataset is based on the movie review polarity dataset (v2.0) collected and maintained by Bo Pang and Lillian Lee. Their dataset (we'll call it PL2.0) consists of 1000 positive and 1000 negative movie reviews obtained from the Internet Movie Database (IMDb) review archive.

The main contribution of this release is the enrichment of the documents with "annotator rationales," a concept we describe in our NAACL HLT 2007 paper.

Basically, "rationales" are segments of the text that support an annotator's classification. Let's say we have a movie review that is labeled as positive (i.e. the writer has a favorable opinion of the movie). Then the rationales would be segments of the text that support the claim (by an annotator) that the review is, indeed, positive.

Here are some examples of positive rationales (the segments enclosed by double square brackets):

[[you will enjoy the hell out of]] American Pie.
fortunately, they [[managed to do it in an interesting and funny way]].
he is [[one of the most exciting martial artists on the big screen]], continuing to perform his own stunts and [[dazzling audiences]] with his flashy kicks and punches.
the romance was [[enchanting]].

And here are some examples of negative rationales:

A woman in peril. A confrontation. An explosion. The end. [[Yawn. Yawn. Yawn.]]
when a film makes watching Eddie Murphy [[a tedious experience, you know something is terribly wrong]].
the movie is [[so badly put together]] that even the most casual viewer may notice the [[miserable pacing and stray plot threads]].
[[don't go see]] this movie

Source: README.txt

Homepage