2 dataset results for Speaker Diarization AND Audio AND Chinese

AliMeeting (Multi-Channel Multi-Party Meeting Transcription Challenge)

AliMeeting corpus consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size.

37 PAPERS • 1 BENCHMARK

ASR-RAMC-BIGCCSC: A CHINESE CONVERSATIONAL SPEECH CORPUS

A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset, including 180 hours of Mandarin Chinese dialogue, 150, 10 and 20 hours for the training set, development set and test set respectively. It contains 351 multi-turn dialogues, each of which is a coherent and compact conversation centered around one theme.

1 PAPER • NO BENCHMARKS YET

Datasets

2 dataset results for Speaker Diarization AND Audio AND Chinese