🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language (clear)

44 dataset results for segmentation AND Texts AND English

COST (COCO Segmentation Text)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 PAPERS • NO BENCHMARKS YET

A Large Scale Fish Dataset (A Large-Scale Dataset for Fish Segmentation and Classification)

…If you use this dataset in your work, please consider to cite: @inproceedings{ulucan2020large, title={A Large-Scale Dataset for Fish Segmentation and Classification}, author={Ulucan, Oguzhan and Karakaya This dataset was collected in order to carry out segmentation, feature extraction, and classification tasks and compare the common segmentation, feature extraction, and classification algorithms (Semantic Segmentation, Convolutional Neural Networks, Bag of Features).

1 PAPER • NO BENCHMARKS YET

DISRPT2019

DISRPT2019 (DISRPT2019 shared task on Discourse Unit Segmentation and Connective Detection)

The DISRPT 2019 workshop introduces the first iteration of a cross-formalism shared task on discourse unit segmentation. Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. Because different corpora, languages and frameworks use different guidelines for segmentation, the shared task is meant to promote design of flexible methods for dealing with various guidelines, and help

4 PAPERS • NO BENCHMARKS YET

DISRPT2021

DISRPT2021 (DISRPT2021 shared task on Discourse Unit Segmentation, Connective Detection and Discourse Relation Classification)

The DISRPT 2021 shared task, co-located with CODI 2021 at EMNLP, introduces the second iteration of a cross-formalism shared task on discourse unit segmentation and connective detection, as well as the

3 PAPERS • NO BENCHMARKS YET

Famulus

This is a dataset for segmentation and classification of epistemic activities in diagnostic reasoning texts.

3 PAPERS • NO BENCHMARKS YET

gRefCOCO

gRefCOCO is the first large-scale Generalized Referring Expression Segmentation dataset that contains multi-target, no-target, and single-target expressions.

21 PAPERS • 2 BENCHMARKS

BiasCorp

BiasCorp is a dataset for racism detection containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube.

2 PAPERS • NO BENCHMARKS YET

FUNSD-r

…In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words. The proposed FUNSD-r consists of 199 document samples including the image, layout annotation of segments and words, and labeled entities of 3 categories.

3 PAPERS • 1 BENCHMARK

CORD-r

…In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words. The proposed CORD-r consists of 999 document samples including the image, layout annotation of segments and words, and labeled entities of 30 categories.

3 PAPERS • 1 BENCHMARK

HaDes

…To create this dataset, a large number of text segments extracted from English language Wikipedia are perturbed, and then verified these with crowd-sourced annotations.

5 PAPERS • NO BENCHMARKS YET

LSSED

…Each segment is annotated for the presence of 11 emotions (angry, neutral, fear, happy, sad, disappointed, bored, disgusted, excited, surprised, fear and other)

6 PAPERS • 1 BENCHMARK

PropSegmEnt

…The dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different

1 PAPER • NO BENCHMARKS YET

FewSOL (A Dataset for Few-Shot Object Learning in Robotic Environments)

…Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. FewSOL dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and

4 PAPERS • NO BENCHMARKS YET

Movie Reviews

Movie Reviews (Movie Review Polarity Dataset Enriched with "Annotator Rationales")

…Basically, "rationales" are segments of the text that support an annotator's classification. Then the rationales would be segments of the text that support the claim (by an annotator) that the review is, indeed, positive. Here are some examples of positive rationales (the segments enclosed by double square brackets): [[you will enjoy the hell out of]] American Pie. fortunately, they [[managed to do it in an interesting

1 PAPER • NO BENCHMARKS YET

Tilde MODEL Corpus

Tilde MODEL Corpus (Tilde Multilingual Open Data for European Languages)

…It contains over 10M segments of multilingual open data. The data has been collected from sites allowing free use and reuse of its content, as well as from Public Sector web sites.

2 PAPERS • NO BENCHMARKS YET

Referring Expressions for DAVIS 2016 & 2017

…To validate our approach we employ two popular video object segmentation datasets, DAVIS16 [38] and DAVIS17 [42]. For the multiple object video segmentation task we consider DAVIS17. As our goal is to segment objects in videos using language specifications, we augment all objects annotated with mask labels in DAVIS16 and DAVIS17 with non-ambiguous referring expressions. (We actually quantified that only∼ 15% of the collected descriptions become invalid over time and it does not affect strongly segmentation results as temporal consistency step helps to disambiguate some We believe the collected data will be of interest to segmentation as well as vision and language communities, providing an opportunity to explore language as alternative input for video object segmentation

75 PAPERS • 5 BENCHMARKS

Multi-Modal CelebA-HQ

…Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

27 PAPERS • 3 BENCHMARKS

FSOCO

…It contains human annotated ground truth labels for both bounding boxes and instance-wise segmentation masks.

1 PAPER • NO BENCHMARKS YET

MineralImage5k (Benchmark for 5k raw mineral species recognition)

…In addition to the samples themselves, some entries in the dataset are accompanied by supplementary natural language descriptions, size measurements, and segmentation masks.

1 PAPER • NO BENCHMARKS YET

GATITOS

GATITOS (Google's Additional Translations Into Tail-languages: Often Short)

…This dataset consists in 4,000 English segments (4,500 tokens) that have been translated into each of 26 low-resource languages, as well as three higher-resource pivot languages (es, fr, hi).

1 PAPER • NO BENCHMARKS YET

GUM (Georgetown University Multilayer corpus)

…Annotations include: Multiple POS tags, morphological features and lemmatization Sentence segmentation and rough speech act Document structure in TEI XML (paragraphs, headings, figures, etc.)

8 PAPERS • 1 BENCHMARK

SegmentedTables

The SegmentedTables dataset is a collection of almost 2,000 tables extracted from 352 machine learning papers. Each table consists of rich text content, layout and caption.

2 PAPERS • NO BENCHMARKS YET

YTSeg

We present YTSeg, a topically and structurally diverse benchmark for the text segmentation task based on YouTube transcriptions.

1 PAPER • 2 BENCHMARKS

arXMLiv:08.2018

…The dataset is segmented in 3 different subsets, each corresponding to a severity level of the LaTeXML software responsible for the HTML5 conversion.

1 PAPER • NO BENCHMARKS YET

HiREST (HIerarchical REtrieval and STep-captioning)

…The dataset consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.

2 PAPERS • NO BENCHMARKS YET

MeetingBank

…The datasets contains 6,892 segment-level summarization instances for training and evaluating of performance.

7 PAPERS • NO BENCHMARKS YET

MedSecId

…The goal of this work is to segment the sections of clinical medical domain documentation.

2 PAPERS • 2 BENCHMARKS

Open Images V7

…A subset of 1.9M includes diverse annotations types. 15,851,536 boxes on 600 classes 2,785,498 instance segmentations on 350 classes 3,284,280 relationship annotations on 1,466 relationships 675,155

4 PAPERS • NO BENCHMARKS YET

Jamendo Corpus

…Segments of each song are annotated as “voice” (sung or spoken) or “no-voice”. The songs constitute a total of about 6 hours of music.

3 PAPERS • NO BENCHMARKS YET

V3C1

V3C1 (the Vimeo Creative Commons Collection 1)

The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics.

1 PAPER • NO BENCHMARKS YET

ALFI (Annotations for Label-Free Images)

…It consists of 29 time-lapse image sequences with various annotations (pixel-wise segmentation masks, object-wise bounding boxes, and tracking information), made publicly available to the scientific community

0 PAPER • NO BENCHMARKS YET

Persuasion Strategies

…The dataset also provides image segmentation masks, which labels persuasion strategies in the corresponding ad images on the test split.

2 PAPERS • NO BENCHMARKS YET

TRIPOD (TuRnIng POint Dataset)

…The screenplay (all dialogue and description parts of the movie) segmented into scenes (selected from the Scriptbase dataset). Gold scene-level TP labels for the screenplays of the test set.

11 PAPERS • NO BENCHMARKS YET

A2D Sentences (Sentences for the Actor-Action Dataset (A2D))

The Actor-Action Dataset (A2D) by Xu et al. [29] serves as the largest video dataset for the general actor and action segmentation task. As we are interested in pixel-level actor and action segmentation from sentences, we augment the videos in A2D with natural language descriptions about what each actor is doing in the videos.

29 PAPERS • 1 BENCHMARK

QAConv

…We segment long conversations into chunks, and use a question generator and dialogue summarizer as auxiliary tools to collect multi-hop questions.

5 PAPERS • NO BENCHMARKS YET

Localized Narratives

…This dense visual grounding takes the form of a mouse trace segment per word and is unique to our data.

55 PAPERS • 5 BENCHMARKS

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 (BIWI 3D)

…In order to ease automatic speech segmentation, we carried out the recordings in a anechoic room, with walls covered by sound wave-absorbing materials.

5 PAPERS • 1 BENCHMARK

EPIC-KITCHENS-100

…EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments

138 PAPERS • 7 BENCHMARKS

FSC-P2 (Fearless Steps Challenge Phase2)

…This (FS-02) edition of the FEARLESS STEPS Challenge includes the following 6 tasks --- TASK 1: Speech Activity Detection (SAD) TASK 2: Speaker Identification (using Speaker Segments Track 2: ASR using Diarized Segments (ASR_track2)

1 PAPER • NO BENCHMARKS YET

WebNLG

…natural language generation challenge which consists of mapping the sets of triplets to text, including referring expression generation, aggregation, lexicalization, surface realization, and sentence segmentation

143 PAPERS • 17 BENCHMARKS

BIMCV COVID-19

…In addition, 23 images were annotated by a team of expert radiologists to include semantic segmentation of radiographic findings.

9 PAPERS • NO BENCHMARKS YET

Refer-YouTube-VOS

There exist previous works [6, 10] that constructed referring segmentation datasets for videos. Each video has pixel-level instance segmentation annotation at every 5 frames in 30-fps videos, and their durations are around 3 to 6 seconds.

36 PAPERS • 3 BENCHMARKS

Pistachio Image Dataset

…The image processing techniques, segmentation and feature extraction were applied on the obtained images of the pistachio samples. A pistachio dataset that has sixteen attributes was created.

0 PAPER • NO BENCHMARKS YET

MatSynth

…modern, learning-based techniques for a variety of material-related tasks including, but not limited to, material acquisition, material generation and synthetic data generation e.g. for retrieval or segmentation

2 PAPERS • NO BENCHMARKS YET

Datasets

44 dataset results for segmentation AND Texts AND English