VoxCeleb1 is an audio dataset containing over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.
610 PAPERS • 9 BENCHMARKS
Consists of more than 210k videos for 310 audio classes.
150 PAPERS • 3 BENCHMARKS
A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset, including 180 hours of Mandarin Chinese dialogue, 150, 10 and 20 hours for the training set, development set and test set respectively. It contains 351 multi-turn dialogues, each of which is a coherent and compact conversation centered around one theme.
1 PAPER • NO BENCHMARKS YET