🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task

Filter by Language

9823 dataset results

ALFRED (Action Learning From Realistic Environments and Directives)

ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

130 PAPERS • NO BENCHMARKS YET

SID (See-in-the-Dark)

The See-in-the-Dark (SID) dataset contains 5094 raw short-exposure images, each with a corresponding long-exposure reference image. Images were captured using two cameras: Sony α7SII and Fujifilm X-T2.

130 PAPERS • 3 BENCHMARKS

STARE (Structured Analysis of the Retina)

The STARE (Structured Analysis of the Retina) dataset is a dataset for retinal vessel segmentation. It contains 20 equal-sized (700×605) color fundus images. For each image, two groups of annotations are provided..

130 PAPERS • 7 BENCHMARKS

2D-3D-S (2D-3D-Semantic)

The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. It covers over 6,000 m2 collected in 6 large-scale indoor areas that originate from 3 different buildings. It contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces.

129 PAPERS • 8 BENCHMARKS

Kinetics-600

The Kinetics-600 is a large-scale action recognition dataset which consists of around 480K videos from 600 action categories. The 480K videos are divided into 390K, 30K, 60K for training, validation and test sets, respectively. Each video in the dataset is a 10-second clip of action moment annotated from raw YouTube video. It is an extensions of the Kinetics-400 dataset.

129 PAPERS • 7 BENCHMARKS

MIND (MIcrosoft News Dataset)

MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. The mission of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.

129 PAPERS • 1 BENCHMARK

ObjectNet

ObjectNet is a test set of images collected directly using crowd-sourcing. ObjectNet is unique as the objects are captured at unusual poses in cluttered, natural scenes, which can severely degrade recognition performance. There are 50,000 images in the test set which controls for rotation, background and viewpoint. There are 313 object classes with 113 overlapping ImageNet.

129 PAPERS • 4 BENCHMARKS

LFPW (Labeled Face Parts in the Wild)

The Labeled Face Parts in-the-Wild (LFPW) consists of 1,432 faces from images downloaded from the web using simple text queries on sites such as google.com, flickr.com, and yahoo.com. Each image was labeled by three MTurk workers, and 29 fiducial points, shown below, are included in dataset.

127 PAPERS • NO BENCHMARKS YET

PartNet

PartNet is a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. The dataset consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This dataset enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others.

127 PAPERS • 3 BENCHMARKS

SearchQA

SearchQA was built using an in-production, commercial search engine. It closely reflects the full pipeline of a (hypothetical) general question-answering system, which consists of information retrieval and answer synthesis.

127 PAPERS • 1 BENCHMARK

FC100 (Fewshot-CIFAR100)

The FC100 dataset (Fewshot-CIFAR100) is a newly split dataset based on CIFAR-100 for few-shot learning. It contains 20 high-level categories which are divided into 12, 4, 4 categories for training, validation and test. There are 60, 20, 20 low-level classes in the corresponding split containing 600 images of size 32 × 32 per class. Smaller image size makes it more challenging for few-shot learning.

126 PAPERS • 5 BENCHMARKS

Multi30K

Multi30K is a large-scale multilingual multimodal dataset for interdisciplinary machine learning research. It extends the Flickr30K dataset with German translations created by professional translators over a subset of the English descriptions, and descriptions crowdsourced independently of the original English descriptions. The dataset was introduced to stimulate multilingual multimodal research.

126 PAPERS • 9 BENCHMARKS

Europarl

Europarl (European Parliament Proceedings Parallel Corpus)

A corpus of parallel text in 21 European languages from the proceedings of the European Parliament.

125 PAPERS • NO BENCHMARKS YET

PATTERN

PATTERN is a node classification tasks generated with Stochastic Block Models, which is widely used to model communities in social networks by modulating the intra- and extra-communities connections, thereby controlling the difficulty of the task. PATTERN tests the fundamental graph task of recognizing specific predetermined subgraphs.

125 PAPERS • 1 BENCHMARK

UrbanSound8K

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy. All excerpts are taken from field recordings uploaded to www.freesound.org.

125 PAPERS • 1 BENCHMARK

e-SNLI

e-SNLI is used for various goals, such as obtaining full sentence justifications of a model's decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets.

125 PAPERS • 1 BENCHMARK

ORL (Our Database of Faces)

The ORL Database of Faces contains 400 images from 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The size of each image is 92x112 pixels, with 256 grey levels per pixel.

124 PAPERS • 1 BENCHMARK

Replay-Attack

The Replay-Attack Database for face spoofing consists of 1300 video clips of photo and video attack attempts to 50 clients, under different lighting conditions. All videos are generated by either having a (real) client trying to access a laptop through a built-in webcam or by displaying a photo or a video recording of the same client for at least 9 seconds.

124 PAPERS • 1 BENCHMARK

SumMe

The SumMe dataset is a video summarization dataset consisting of 25 videos, each annotated with at least 15 human summaries (390 in total).

124 PAPERS • 3 BENCHMARKS

VOT2018

VOT2018 is a dataset for visual object tracking. It consists of 60 challenging videos collected from real-life datasets.

124 PAPERS • 1 BENCHMARK

BLUE

BLUE (Biomedical Language Understanding Evaluation)

The BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora. These tasks cover a diverse range of text genres (biomedical literature and clinical notes), dataset sizes, and degrees of difficulty and, more importantly, highlight common biomedicine text-mining challenges.

123 PAPERS • NO BENCHMARKS YET

CIFAR100-LT

The Long-tailed Version of CIFAR100

123 PAPERS • NO BENCHMARKS YET

Make3D

The Make3D dataset is a monocular Depth Estimation dataset that contains 400 single training RGB and depth map pairs, and 134 test samples. The RGB images have high resolution, while the depth maps are provided at low resolution.

122 PAPERS • 1 BENCHMARK

CBSD68

CBSD68 (Color BSD68)

Color BSD68 dataset for image denoising benchmarks is part of The Berkeley Segmentation Dataset and Benchmark. It is used for measuring image denoising algorithms performance. It contains 68 images.

121 PAPERS • 16 BENCHMARKS

ELI5

ELI5 is a dataset for long-form question answering. It contains 270K complex, diverse questions that require explanatory multi-sentence answers. Web search results are used as evidence documents to answer each question.

121 PAPERS • 1 BENCHMARK

Meta-Dataset

The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

121 PAPERS • 2 BENCHMARKS

NSynth

NSynth is a dataset of one shot instrumental notes, containing 305,979 musical notes with unique pitch, timbre and envelope. The sounds were collected from 1006 instruments from commercial sample libraries and are annotated based on their source (acoustic, electronic or synthetic), instrument family and sonic qualities. The instrument families used in the annotation are bass, brass, flute, guitar, keyboard, mallet, organ, reed, string, synth lead and vocal. Four second monophonic 16kHz audio snippets were generated (notes) for the instruments.

121 PAPERS • 3 BENCHMARKS

UNSW-NB15 (UNSQ-NB15)

UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.

121 PAPERS • 2 BENCHMARKS

Yahoo! Answers

The Yahoo! Answers topic classification dataset is constructed using 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000 and testing samples 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information. Source:github

121 PAPERS • 2 BENCHMARKS

SciERC

SciERC dataset is a collection of 500 scientific abstract annotated with scientific entities, their relations, and coreference clusters. The abstracts are taken from 12 AI conference/workshop proceedings in four AI communities, from the Semantic Scholar Corpus. SciERC extends previous datasets in scientific articles SemEval 2017 Task 10 and SemEval 2018 Task 7 by extending entity types, relation types, relation coverage, and adding cross-sentence relations using coreference links.

120 PAPERS • 7 BENCHMARKS

Virtual KITTI

Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.

120 PAPERS • 1 BENCHMARK

CityPersons

The CityPersons dataset is a subset of Cityscapes which only consists of person annotations. There are 2975 images for training, 500 and 1575 images for validation and testing. The average of the number of pedestrians in an image is 7. The visible-region and full-body annotations are provided.

119 PAPERS • 2 BENCHMARKS

FBMS (Freiburg-Berkeley Motion Segmentation)

The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is an extension of the BMS dataset with 33 additional video sequences. A total of 720 frames is annotated. It has pixel-accurate segmentation annotations of moving objects. FBMS-59 comes with a split into a training set and a test set.

119 PAPERS • 3 BENCHMARKS

JFT-300M

JFT-300M is an internal Google dataset used for training image classification models. Images are labeled using an algorithm that uses complex mixture of raw web signals, connections between web-pages and user feedback. This results in over one billion labels for the 300M images (a single image can have multiple labels). Of the billion image labels, approximately 375M are selected via an algorithm that aims to maximize label precision of selected images.

119 PAPERS • 1 BENCHMARK

MSRA-TD500 (MSRA Text Detection 500 Database)

The MSRA-TD500 dataset is a text detection dataset that contains 300 training images and 200 test images. Text regions are arbitrarily orientated and annotated at sentence level. Different from the other datasets, it contains both English and Chinese text.

119 PAPERS • 1 BENCHMARK

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:

119 PAPERS • 14 BENCHMARKS

SemEval-2010 Task-8

The dataset for the SemEval-2010 Task 8 is a dataset for multi-way classification of mutually exclusive semantic relations between pairs of nominals.

118 PAPERS • 1 BENCHMARK

VehicleID (PKU VehicleID)

The “VehicleID” dataset contains CARS captured during the daytime by multiple real-world surveillance cameras distributed in a small city in China. There are 26,267 vehicles (221,763 images in total) in the entire dataset. Each image is attached with an id label corresponding to its identity in real world. In addition, the dataset contains manually labelled 10319 vehicles (90196 images in total) of their vehicle model information(i.e.“MINI-cooper”, “Audi A6L” and “BWM 1 Series”).

118 PAPERS • 4 BENCHMARKS

FSD50K (Freesound Database 50K)

Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra. It consists mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more.

117 PAPERS • 2 BENCHMARKS

Materials Project

The Materials Project is a collection of chemical compounds labelled with different attributes. The labelling is performed by different simulations, most of them at DFT level of theory.

117 PAPERS • 2 BENCHMARKS

TVQA

The TVQA dataset is a large-scale video dataset for video question answering. It is based on 6 popular TV shows (Friends, The Big Bang Theory, How I Met Your Mother, House M.D., Grey's Anatomy, Castle). It includes 152,545 QA pairs from 21,793 TV show clips. The QA pairs are split into the ratio of 8:1:1 for training, validation, and test sets. The TVQA dataset provides the sequence of video frames extracted at 3 FPS, the corresponding subtitles with the video clips, and the query consisting of a question and four answer candidates. Among the four answer candidates, there is only one correct answer.

117 PAPERS • 3 BENCHMARKS

DocVQA

DocVQA consists of 50,000 questions defined on 12,000+ document images.

116 PAPERS • 2 BENCHMARKS

GENIA

The GENIA corpus is the primary collection of biomedical literature compiled and annotated within the scope of the GENIA project. The corpus was created to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology.

116 PAPERS • 6 BENCHMARKS

Microsoft Academic Graph

The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study.

116 PAPERS • 1 BENCHMARK

NABirds (North America Birds)

NABirds V1 is a collection of 48,000 annotated photographs of the 400 species of birds that are commonly observed in North America. More than 100 photographs are available for each species, including separate annotations for males, females and juveniles that comprise 700 visual categories. This dataset is to be used for fine-grained visual categorization experiments.

116 PAPERS • 1 BENCHMARK

RESISC45

RESISC45 dataset is a dataset for Remote Sensing Image Scene Classification (RESISC). It contains 31,500 RGB images of size 256×256 divided into 45 scene classes, each class containing 700 images. Among its notable features, RESISC45 contains varying spatial resolution ranging from 20cm to more than 30m/px.

116 PAPERS • 1 BENCHMARK

Something-Something V1

The 20BN-SOMETHING-SOMETHING dataset is a large collection of labeled video clips that show humans performing pre-defined basic actions with everyday objects. The dataset was created by a large number of crowd workers. It allows machine learning models to develop fine-grained understanding of basic actions that occur in the physical world. It contains 108,499 videos, with 86,017 in the training set, 11,522 in the validation set and 10,960 in the test set. There are 174 labels.

116 PAPERS • 3 BENCHMARKS

mC4

mC4 is a multilingual variant of the C4 dataset called mC4. mC4 comprises natural text in 101 languages drawn from the public Common Crawl web scrape.

116 PAPERS • NO BENCHMARKS YET

Datasets

9823 dataset results