VOICES (Voices Obscured In Complex Environmental Settings)

Introduced by Colleen et al. in Voices Obscured in Complex Environmental Settings (VOICES) corpus

The VOICES corpus is a dataset to promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions.

For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone.

Homepage