The PEDC is a corpus of 14 episodes of This American Life podcast transcripts that have been annotated for events. The corpus contains excerpts from these episodes (listed in Tabe 1) that are dialogue. The granularity of annotation in this corpus is the token; each token is either annotated as an event, or a nonevent. For more information please download the corpus, and see the annotation guide for more specifics on how we define event, and the README for how the annotations are encoded. Also, much more information regarding the corpus, and its use is in the Automatic extraction of personal events from dialogue paper.
Paper | Code | Results | Date | Stars |
---|