Cornell Movie-Dialogs Corpus

This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts:

  • 220,579 conversational exchanges between 10,292 pairs of movie characters
  • involves 9,035 characters from 617 movies
  • in total 304,713 utterances
  • movie metadata included:
    • genres
    • release year
    • IMDB rating
    • number of IMDB votes
    • IMDB rating
  • character metadata included:
    • gender (for 3,774 characters)
    • position on movie credits (3,321 characters)
Source: Cornell Movie-Dialogs Corpus

Papers


Paper Code Results Date Stars

Tasks


License


  • Unknown

Modalities


Languages