Common Voice is an audio dataset that consists of a unique MP3 and corresponding text file. There are 9,283 recorded hours in the dataset. The dataset also includes demographic metadata like age, sex, and accent. The dataset consists of 7,335 validated hours in 60 languages.
314 PAPERS • 266 BENCHMARKS
ADIMA is a novel, linguistically diverse, ethically sourced, expert annotated and well-balanced multilingual profanity detection audio dataset comprising of 11,775 audio samples in 10 Indic languages spanning 65 hours and spoken by 6,446 unique users.
2 PAPERS • NO BENCHMARKS YET
EmoSpeech contains keywords with diverse emotions and background sounds, presented to explore new challenges in audio analysis.
1 PAPER • NO BENCHMARKS YET