🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task

Filter by Language

9752 dataset results

The ImageCLEF-DA dataset is a benchmark dataset for ImageCLEF 2014 domain adaptation challenge, which contains three domains: Caltech-256 (C), ImageNet ILSVRC 2012 (I) and Pascal VOC 2012 (P). For each domain, there are 12 categories and 50 images in each category.

92 PAPERS • 5 BENCHMARKS

CommonGen

CommonGen is constructed through a combination of crowdsourced and existing caption corpora, consists of 79k commonsense descriptions over 35k unique concept-sets.

91 PAPERS • 1 BENCHMARK

Hopkins155

The Hopkins 155 dataset consists of 156 video sequences of two or three motions. Each video sequence motion corresponds to a low-dimensional subspace. There are 39−550 data vectors drawn from two or three motions for each video sequence.

91 PAPERS • 1 BENCHMARK

JIGSAWS (JHU-ISI Gesture and Skill Assessment Working Set)

The JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) is a surgical activity dataset for human motion modeling. The data was collected through a collaboration between The Johns Hopkins University (JHU) and Intuitive Surgical, Inc. (Sunnyvale, CA. ISI) within an IRB-approved study. The release of this dataset has been approved by the Johns Hopkins University IRB. The dataset was captured using the da Vinci Surgical System from eight surgeons with different levels of skill performing five repetitions of three elementary surgical tasks on a bench-top model: suturing, knot-tying and needle-passing, which are standard components of most surgical skills training curricula. The JIGSAWS dataset consists of three components:

91 PAPERS • 3 BENCHMARKS

MUSDB18

The MUSDB18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems.

91 PAPERS • 2 BENCHMARKS

CBT (Children’s Book Test)

Children’s Book Test (CBT) is designed to measure directly how well language models can exploit wider linguistic context. The CBT is built from books that are freely available thanks to Project Gutenberg.

90 PAPERS • 2 BENCHMARKS

InBreast

Rationale and objectives: Computer-aided detection and diagnosis (CAD) systems have been developed in the past two decades to assist radiologists in the detection and diagnosis of lesions seen on breast imaging exams, thus providing a second opinion. Mammographic databases play an important role in the development of algorithms aiming at the detection and diagnosis of mammary lesions. However, available databases often do not take into consideration all the requirements needed for research and study purposes. This article aims to present and detail a new mammographic database.

90 PAPERS • 2 BENCHMARKS

PoseTrack

The PoseTrack dataset is a large-scale benchmark for multi-person pose estimation and tracking in videos. It requires not only pose estimation in single frames, but also temporal tracking across frames. It contains 514 videos including 66,374 frames in total, split into 300, 50 and 208 videos for training, validation and test set respectively. For training videos, 30 frames from the center are annotated. For validation and test videos, besides 30 frames from the center, every fourth frame is also annotated for evaluating long range articulated tracking. The annotations include 15 body keypoints location, a unique person id and a head bounding box for each person instance.

90 PAPERS • 5 BENCHMARKS

ConvAI2 (Conversational Intelligence Challenge 2)

The ConvAI2 NeurIPS competition aimed at finding approaches to creating high-quality dialogue agents capable of meaningful open domain conversation. The ConvAI2 dataset for training models is based on the PERSONA-CHAT dataset. The speaker pairs each have assigned profiles coming from a set of 1155 possible personas (at training time), each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation. As the original PERSONA-CHAT test set was released, a new hidden test set consisted of 100 new personas and over 1,015 dialogs was created by crowdsourced workers.

89 PAPERS • 1 BENCHMARK

FaceWarehouse

FaceWarehouse is a 3D facial expression database that provides the facial geometry of 150 subjects, covering a wide range of ages and ethnic backgrounds.

89 PAPERS • NO BENCHMARKS YET

MAESTRO

The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).

89 PAPERS • NO BENCHMARKS YET

MM-Vet

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

89 PAPERS • 1 BENCHMARK

Math23K

Math23K (Math23K for Math Word Problem Solving)

Math23K is a dataset created for math word problem solving, contains 23, 162 Chinese problems crawled from the Internet. Refer to our paper for more details: The dataset is originally introduced in the paper Deep Neural Solver for Math Word Problems. The original files are originally split into train/test split, while other research efforts (https://github.com/2003pro/Graph2Tree) perform the train/dev/test split.

89 PAPERS • 1 BENCHMARK

NTU RGB+D 120

NTU RGB+D 120 is a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities.

89 PAPERS • 8 BENCHMARKS

ReDial

ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of over 10,000 conversations centered around the theme of providing movie recommendations.

89 PAPERS • 2 BENCHMARKS

SA-1B

SA-1B consists of 11M diverse, high resolution, licensed, and privacy protecting images and 1.1B high-quality segmentation masks.

89 PAPERS • NO BENCHMARKS YET

SYSU-MM01

The SYSU-MM01 is a dataset collected for the Visible-Infrared Re-identification problem. The images in the dataset were obtained from 491 different persons by recording them using 4 RGB and 2 infrared cameras. Within the dataset, the persons are divided into 3 fixed splits to create training, validation and test sets. In the training set, there are 20284 RGB and 9929 infrared images of 296 persons. The validation set contains 1974 RGB and 1980 infrared images of 99 persons. The testing set consists of the images of 96 persons where 3803 infrared images are used as query and 301 randomly selected RGB images are used as gallery.

89 PAPERS • 2 BENCHMARKS

TORCS (The Open Racing Car Simulator)

TORCS (The Open Racing Car Simulator) is a driving simulator. It is capable of simulating the essential elements of vehicular dynamics such as mass, rotational inertia, collision, mechanics of suspensions, links and differentials, friction and aerodynamics. Physics simulation is simplified and is carried out through Euler integration of differential equations at a temporal discretization level of 0.002 seconds. The rendering pipeline is lightweight and based on OpenGL that can be turned off for faster training. TORCS offers a large variety of tracks and cars as free assets. It also provides a number of programmed robot cars with different levels of performance that can be used to benchmark the performance of human players and software driving agents. TORCS was built with the goal of developing Artificial Intelligence for vehicular control and has been used extensively by the machine learning community ever since its inception.

89 PAPERS • NO BENCHMARKS YET

COVIDx

COVIDx (COVIDx CRX-2)

An open access benchmark dataset comprising of 13,975 CXR images across 13,870 patient cases, with the largest number of publicly available COVID-19 positive cases to the best of the authors' knowledge.

88 PAPERS • 1 BENCHMARK

CosmosQA

CosmosQA is a large-scale dataset of 35.6K problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. It focuses on reading between the lines over a diverse collection of people’s everyday narratives, asking questions concerning on the likely causes or effects of events that require reasoning beyond the exact text spans in the context.

88 PAPERS • NO BENCHMARKS YET

MiT (Moments in Time)

Moments in Time is a large-scale dataset for recognizing and understanding action in videos. The dataset includes a collection of one million labeled 3 second videos, involving people, animals, objects or natural phenomena, that capture the gist of a dynamic scene.

88 PAPERS • 2 BENCHMARKS

SEED (SJTU Emotion EEG Dataset)

The SEED dataset contains subjects' EEG signals when they were watching films clips. The film clips are carefully selected so as to induce different types of emotion, which are positive, negative, and neutral ones.

88 PAPERS • 4 BENCHMARKS

VGG Face

The VGG Face dataset is face identity recognition dataset that consists of 2,622 identities. It contains over 2.6 million images.

88 PAPERS • NO BENCHMARKS YET

CoNLL-2012

The CoNLL-2012 shared task involved predicting coreference in English, Chinese, and Arabic, using the final version, v5.0, of the OntoNotes corpus. It was a follow-on to the English-only task organized in 2011.

87 PAPERS • 4 BENCHMARKS

JFLEG (JHU FLuency-Extended GUG corpus)

JFLEG is for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding.

87 PAPERS • 5 BENCHMARKS

RLBench

RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.

87 PAPERS • 3 BENCHMARKS

Raindrop

Raindrop is a set of image pairs, where each pair contains exactly the same background scene, yet one is degraded by raindrops and the other one is free from raindrops. To obtain this, the images are captured through two pieces of exactly the same glass: one sprayed with water, and the other is left clean. The dataset consists of 1,119 pairs of images, with various background scenes and raindrops. They were captured with a Sony A6000 and a Canon EOS 60.

87 PAPERS • 1 BENCHMARK

iSUN

iSUN is a ground truth of gaze traces on images from the SUN dataset. The collection is partitioned into 6,000 images for training, 926 for validation and 2,000 for test.

87 PAPERS • NO BENCHMARKS YET

JAFFE (Japanese Female Facial Expression)

The JAFFE dataset consists of 213 images of different facial expressions from 10 different Japanese female subjects. Each subject was asked to do 7 facial expressions (6 basic facial expressions and neutral) and the images were annotated with average semantic ratings on each facial expression by 60 annotators.

86 PAPERS • 4 BENCHMARKS

KonIQ-10k (Konstanz Image Quality 10k Database)

KonIQ-10k is a large-scale IQA dataset consisting of 10,073 quality scored images. This is the first in-the-wild database aiming for ecological validity, with regard to the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models.

86 PAPERS • 1 BENCHMARK

Mapillary Vistas Dataset

Mapillary Vistas Dataset is a diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world.

86 PAPERS • 3 BENCHMARKS

ActivityNet-QA

The ActivityNet-QA dataset contains 58,000 human-annotated QA pairs on 5,800 videos derived from the popular ActivityNet dataset. The dataset provides a benchmark for testing the performance of VideoQA models on long-term spatio-temporal reasoning.

85 PAPERS • 2 BENCHMARKS

CrowdPose

The CrowdPose dataset contains about 20,000 images and a total of 80,000 human poses with 14 labeled keypoints. The test set includes 8,000 images. The crowded images containing homes are extracted from MSCOCO, MPII and AI Challenger.

85 PAPERS • 2 BENCHMARKS

HateXplain

Covers multiple aspects of the issue. Each post in the dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based.

85 PAPERS • 3 BENCHMARKS

IDD (Indian Driving Dataset)

IDD is a dataset for road scene understanding in unstructured environments used for semantic segmentation and object detection for autonomous driving. It consists of 10,004 images, finely annotated with 34 classes collected from 182 drive sequences on Indian roads.

85 PAPERS • NO BENCHMARKS YET

Kvasir (The Kvasir Dataset)

The KVASIR Dataset was released as part of the medical multimedia challenge presented by MediaEval. It is based on images obtained from the GI tract via an endoscopy procedure. The dataset is composed of images that are annotated and verified by medical doctors, and captures 8 different classes. The classes are based on three anatomical landmarks (z-line, pylorus, cecum), three pathological findings (esophagitis, polyps, ulcerative colitis) and two other classes (dyed and lifted polyps, dyed resection margins) related to the polyp removal process. Overall, the dataset contains 8,000 endoscopic images, with 1,000 image examples per class.

85 PAPERS • 3 BENCHMARKS

LEVIR-CD

LEVIR-CD is a new large-scale remote sensing building Change Detection dataset. The introduced dataset would be a new benchmark for evaluating change detection (CD) algorithms, especially those based on deep learning.

85 PAPERS • 2 BENCHMARKS

The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.

85 PAPERS • NO BENCHMARKS YET

PG-19

A new open-vocabulary language modelling benchmark derived from books.

85 PAPERS • 1 BENCHMARK

ASPEC (Asian Scientific Paper Excerpt Corpus)

ASPEC, Asian Scientific Paper Excerpt Corpus, is constructed by the Japan Science and Technology Agency (JST) in collaboration with the National Institute of Information and Communications Technology (NICT). It consists of a Japanese-English paper abstract corpus of 3M parallel sentences (ASPEC-JE) and a Japanese-Chinese paper excerpt corpus of 680K parallel sentences (ASPEC-JC). This corpus is one of the achievements of the Japanese-Chinese machine translation project which was run in Japan from 2006 to 2010.

84 PAPERS • NO BENCHMARKS YET

AVE

AVE (Audio-Visual Event Localization)

To investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization.

84 PAPERS • NO BENCHMARKS YET

MAWPS

MAWPS (MAth Word ProblemS)

MAWPS is an online repository of Math Word Problems, to provide a unified testbed to evaluate different algorithms. MAWPS allows for the automatic construction of datasets with particular characteristics, providing tools for tuning the lexical and template overlap of a dataset as well as for filtering ungrammatical problems from web-sourced corpora. The online nature of this repository facilitates easy community contribution. Amassed 3,320 problems, including the full datasets used in several previous works.

84 PAPERS • 2 BENCHMARKS

McMaster

The McMaster dataset is a dataset for color demosaicing, which contains 18 cropped images of size 500×500.

84 PAPERS • 6 BENCHMARKS

PadChest

PadChest is a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan Hospital (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography. The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology. Of these reports, 27% were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms. The labels generated were then validated in an independent test set achieving a 0.93 Micro-F1 score.

84 PAPERS • NO BENCHMARKS YET

TabFact

TabFact is a large-scale dataset which consists of 117,854 manually annotated statements with regard to 16,573 Wikipedia tables, their relations are classified as ENTAILED and REFUTED. TabFact is the first dataset to evaluate language inference on structured data, which involves mixed reasoning skills in both symbolic and linguistic aspects.

84 PAPERS • 1 BENCHMARK

WikiMatrix

WikiMatrix is a dataset of parallel sentences in the textual content of Wikipedia for all possible language pairs. The mined data consists of:

84 PAPERS • NO BENCHMARKS YET

ARC (AI2 Reasoning Challenge)

The AI2’s Reasoning Challenge (ARC) dataset is a multiple-choice question-answering dataset, containing questions from science exams from grade 3 to grade 9. The dataset is split in two partitions: Easy and Challenge, where the latter partition contains the more difficult questions that require reasoning. Most of the questions have 4 answer choices, with <1% of all the questions having either 3 or 5 answer choices. ARC includes a supporting KB of 14.3M unstructured text passages.

83 PAPERS • 3 BENCHMARKS

GlaS (Gland Segmentation in Colon Histology Images Challenge)

The dataset used in this challenge consists of 165 images derived from 16 H&E stained histological sections of stage T3 or T42 colorectal adenocarcinoma. Each section belongs to a different patient, and sections were processed in the laboratory on different occasions. Thus, the dataset exhibits high inter-subject variability in both stain distribution and tissue architecture. The digitization of these histological sections into whole-slide images (WSIs) was accomplished using a Zeiss MIRAX MIDI Slide Scanner with a pixel resolution of 0.465µm.

83 PAPERS • 1 BENCHMARK

Datasets

9752 dataset results