…The dataset consists of short speech segments automatically extracted from media videos available on YouTube and manually transcribed, with some pre- and post-processing.
4 PAPERS • 1 BENCHMARK
…An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions.
39 PAPERS • 1 BENCHMARK
…A subset of 1.9M includes diverse annotations types. 15,851,536 boxes on 600 classes 2,785,498 instance segmentations on 350 classes 3,284,280 relationship annotations on 1,466 relationships 675,155
4 PAPERS • NO BENCHMARKS YET
…Segmented transcripts are also provided. The corpus aims to support researchers in speech recognition, machine translation, voiceprint recognition, and other speech-related fields.
0 PAPER • NO BENCHMARKS YET