8 dataset results for segmentation AND Spanish

DISRPT2019

DISRPT2019 (DISRPT2019 shared task on Discourse Unit Segmentation and Connective Detection)

The DISRPT 2019 workshop introduces the first iteration of a cross-formalism shared task on discourse unit segmentation. Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. Because different corpora, languages and frameworks use different guidelines for segmentation, the shared task is meant to promote design of flexible methods for dealing with various guidelines, and help

4 PAPERS • NO BENCHMARKS YET

DISRPT2021

DISRPT2021 (DISRPT2021 shared task on Discourse Unit Segmentation, Connective Detection and Discourse Relation Classification)

The DISRPT 2021 shared task, co-located with CODI 2021 at EMNLP, introduces the second iteration of a cross-formalism shared task on discourse unit segmentation and connective detection, as well as the

3 PAPERS • NO BENCHMARKS YET

Heroes Corpus

Each episode directory contains word-level and segment-level information of the whole episode and also parallel samples extracted under segments_eng and segments_spa subdirectories.

1 PAPER • NO BENCHMARKS YET

AVSpeech

…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.

35 PAPERS • NO BENCHMARKS YET

Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems

…Data are segmented into sentences which are further word tokenized.

2 PAPERS • 12 BENCHMARKS

LSA16 (Lengua de Señas Argentina - 16 Handshapes classes)

…To simplify the problem of hand segmentation, subjects wore fluorescent-colored gloves. These substantially simplify the problem of recognizing the position of the hand and performing its segmentation, and remove all issues associated to skin color variations, while fully retaining the difficulty

3 PAPERS • 1 BENCHMARK

MediaSpeech

…The dataset consists of short speech segments automatically extracted from media videos available on YouTube and manually transcribed, with some pre- and post-processing.

4 PAPERS • 1 BENCHMARK

GATITOS

GATITOS (Google's Additional Translations Into Tail-languages: Often Short)

…This dataset consists in 4,000 English segments (4,500 tokens) that have been translated into each of 26 low-resource languages, as well as three higher-resource pivot languages (es, fr, hi).

1 PAPER • NO BENCHMARKS YET

Datasets

8 dataset results for segmentation AND Spanish