WSJ0-2mix is a speech recognition corpus of speech mixtures using utterances from the Wall Street Journal (WSJ0) corpus.
90 PAPERS • 1 BENCHMARK
Continuous speech separation (CSS) is an approach to handling overlapped speech in conversational audio signals. A real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones.
31 PAPERS • NO BENCHMARKS YET
LibriMix is an open-source alternative to wsj0-2mix. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!.
20 PAPERS • 1 BENCHMARK
MIR-1K (Multimedia Information Retrieval lab, 1000 song clips) is a dataset designed for singing voice separation. It contains:
17 PAPERS • NO BENCHMARKS YET
The QMUL underGround Re-IDentification (GRID) dataset contains 250 pedestrian image pairs. Each pair contains two images of the same individual seen from different camera views. All images are captured from 8 disjoint camera views installed in a busy underground station. The figures beside show a snapshot of each of the camera views of the station and sample images in the dataset. The dataset is challenging due to variations of pose, colours, lighting changes; as well as poor image quality caused by low spatial resolution.
7 PAPERS • 5 BENCHMARKS
Real-M is a crowd-sourced speech-separation corpus of real-life mixtures. The mixtures are recorded in different acoustic environments using a wide variety of recording devices such as laptops and smartphones, thus reflecting more closely potential application scenarios.
1 PAPER • NO BENCHMARKS YET