CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

Introduced by Reece et al. in Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.