CSI is a criminal conversational dataset for speaker identification built from the CSI television show. The authors collected transcripts of 39 episodes and video/audio of 4 episodes. Each episode involves on average more than 30 speakers. Utterances last on average 3 to 4 seconds. There are around 45 to 50 distinct scenes/conversations per episode.
1 PAPER • NO BENCHMARKS YET
The EVI dataset is a challenging, multilingual spoken-dialogue dataset with 5,506 dialogues in English, Polish, and French. The dataset can be used to develop and benchmark conversational systems for user authentication tasks, i.e. speaker enrolment (E), speaker verification (V), speaker identification (I).
1 PAPER • 3 BENCHMARKS