3 dataset results for Speaker Recognition AND Audio

VoxCeleb1 is an audio dataset containing over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.

610 PAPERS • 9 BENCHMARKS

VGG-Sound

Consists of more than 210k videos for 310 audio classes.

150 PAPERS • 3 BENCHMARKS

ASR-RAMC-BIGCCSC: A CHINESE CONVERSATIONAL SPEECH CORPUS

A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset, including 180 hours of Mandarin Chinese dialogue, 150, 10 and 20 hours for the training set, development set and test set respectively. It contains 351 multi-turn dialogues, each of which is a coherent and compact conversation centered around one theme.

1 PAPER • NO BENCHMARKS YET

Datasets

3 dataset results for Speaker Recognition AND Audio