Videos

CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

Introduced by Reece et al. in Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

Homepage