MSP-Podcast (A large naturalistic speech emotional dataset)

The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.7 of the corpus has 62,140 speaking turns (100hrs).

Key features of this corpus:

  • We download available audio recordings with common license. We only use the podcasts that have less restrictive licenses, so we can modify, sell and distribute the corpus (you can use it for commercial product!).
  • Most of the segments in a regular podcasts are neutral. We use machine learning techniques trained with available data to retrieve candidate segments. These segments are emotionally annotated with crowdsourcing. This approach allows us to spend our resources on speech segments that are likely to convey emotions.
  • We annotate categorical emotions and attribute based labels at the speaking turn label
  • This is an ongoing effort, where we currently have 62,140 speaking turns (100h). We collect approximately 10,000-13,000 new speaking turns per year. Our goal is to reach 400 hours.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown