AVSpeech is a large-scale audio-visual dataset comprising speech clips with no interfering background signals. The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
36 PAPERS • NO BENCHMARKS YET
ClovaCall is a new large-scale Korean call-based speech corpus under a goal-oriented dialog scenario from more than 11,000 people. The raw dataset of ClovaCall includes approximately 112,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain.
4 PAPERS • NO BENCHMARKS YET
Kosp2e (read as `kospi'), is a corpus that allows Korean speech to be translated into English text in an end-to-end manner
3 PAPERS • NO BENCHMARKS YET
The Jejueo Single Speaker Speech (JSS) dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file.
1 PAPER • NO BENCHMARKS YET
Deeply Korean read speech corpus contains pairs of Korean speakers reading a script with 3 distinct text sentiments (negative, neutral, positive), with 3 distinct voice sentiments (negative, neutral, positive), are recorded. The recordings took place in 3 different types of places, which are an anechoic chamber, studio apartment, and dance studio, of which the level of reverberation differs. And in order to examine the effect of the distance of mic from the source and device, every experiment is recorded at 3 distinct distances with 2 types of smartphone, iPhone X, and Galaxy S7.
0 PAPER • NO BENCHMARKS YET
Deeply Parent-Child Vocal Interaction contains the interaction of 24 pairs of parent and child(total 48 speakers), such as reading fairy tales, singing children’s songs, conversing, and others, is recorded. The recordings took place in 3 different types of places, which are an anechoic chamber, studio apartment, and dance studio, of which the level of reverberation differs. And in order to examine the effect of the distance of mic from the source and device, every experiment is recorded at 3 distinct distances) with 2 types of smartphone, iPhone X, and Galaxy S7.