6 dataset results for Speaker Recognition

VoxCeleb1 is an audio dataset containing over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.

617 PAPERS • 9 BENCHMARKS

VGG-Sound

Consists of more than 210k videos for 310 audio classes.

152 PAPERS • 3 BENCHMARKS

CN-CELEB

CN-Celeb is a large-scale speaker recognition dataset collected `in the wild'. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world.

63 PAPERS • 1 BENCHMARK

FKD

FKD (Football Keywords Dataset)

The football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 samples in 18 classes.

2 PAPERS • 2 BENCHMARKS

ASR-RAMC-BIGCCSC: A CHINESE CONVERSATIONAL SPEECH CORPUS

A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset, including 180 hours of Mandarin Chinese dialogue, 150, 10 and 20 hours for the training set, development set and test set respectively. It contains 351 multi-turn dialogues, each of which is a coherent and compact conversation centered around one theme.

1 PAPER • NO BENCHMARKS YET

MAVS

MAVS (Multilingual Audio-Visual Smartphone dataset)

MAVS is an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems.

1 PAPER • NO BENCHMARKS YET

Datasets

6 dataset results for Speaker Recognition