The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., โthere are cars in this imageโ but โthere are no tigers,โ and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., โthere is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixelsโ. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
11,037 PAPERS โข 106 BENCHMARKS
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings.
127 PAPERS โข 6 BENCHMARKS
The Neuromorphic-Caltech101 (N-Caltech101) dataset is a spiking version of the original frame-based Caltech101 dataset. The original dataset contained both a "Faces" and "Faces Easy" class, with each consisting of different versions of the same images. The "Faces" class has been removed from N-Caltech101 to avoid confusion, leaving 100 object classes plus a background class. The N-Caltech101 dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views Caltech101 examples on an LCD monitor as shown in the video below. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.
55 PAPERS โข 1 BENCHMARK
A large real-world event-based dataset for object classification.
27 PAPERS โข 1 BENCHMARK
he RSSCN7 dataset contains satellite images acquired from Google Earth, which is originally collected for remote sensing scene classification. We conduct image synthesis on RSSCN7 to make it capable of the image inpainting task. It has seven classes: grassland, farmland, industrial and commercial regions, river and lake, forest field, residential region, and parking lot. Each class has 400 images, so there are total 2,800 images in the RSSCN7 dataset.
13 PAPERS โข 1 BENCHMARK
The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to find the names, pronouns, and occupations. Twenty-eight most frequent occupations were identified based on their appearances. The resulting dataset consists of 397,340 biographies spanning twenty-eight different occupations. Of these occupations, the professor is the most frequent, with 118,400 biographies, while the rapper is the least frequent, with 1,406 biographies. Important information about the biographies: 1. The longest biography is 194 tokens, while the shortest is eighteen; the median biography length is seventy-two tokens. 2. It should be noted that the demographics of online biographiesโ subjects differ from those of the overall workforce and that this dataset does not contain all biographies on the Internet.
11 PAPERS โข 1 BENCHMARK
Dataset Introduction
The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera around a monitor displaying images from ImageNet. N-ImageNet contains approximately 1,300k training samples and 50k validation samples. In addition, the dataset also contains variants of the validation dataset recorded under a wide range of lighting or camera trajectories. Additional details about the dataset are explained in the paper available through this link. Please cite this paper if you make use of the dataset.
7 PAPERS โข 3 BENCHMARKS
Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and at 48,000 samples/second for drive end bearing experiments. All fan end bearing data was collected at 12,000 samples/second.
5 PAPERS โข 1 BENCHMARK
This dataset is a combination of the following three datasets : figshare, SARTAJ dataset and Br35H
3 PAPERS โข 1 BENCHMARK
Open Dataset: Mobility Scenario FIMU
3 PAPERS โข NO BENCHMARKS YET
SciRepEval is a comprehensive benchmark for training and evaluating scientific document representations. It includes 25 challenging and realistic tasks, 11 of which are new, across four formats: classification, regression, ranking and search.
Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter.
3 PAPERS โข 2 BENCHMARKS
Attention Deficit Hyperactivity Disorder (ADHD) affects at least 5-10% of school-age children and is associated with substantial lifelong impairment, with annual direct costs exceeding $36 billion/year in the US. Despite a voluminous empirical literature, the scientific community remains without a comprehensive model of the pathophysiology of ADHD. Further, the clinical community remains without objective biological tools capable of informing the diagnosis of ADHD for an individual or guiding clinicians in their decision-making regarding treatment.
2 PAPERS โข NO BENCHMARKS YET
DeepPCB
2 PAPERS โข 1 BENCHMARK
We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification. 2) Spatial Forgery Localization, which segments the manipulated area of fake images compared to their corresponding source real images. 3) Video Forgery Classification, which re-defines the video-level forgery classification with manipulated frames in random positions. This task is important because attackers in real world are free to manipulate any target frame. and 4) Temporal Forgery Localization, to localize the temporal segments which are manipulated. ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 video
The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across 8 different categories. Two tasks will be available for participation: 1) classify dermoscopic images without meta-data, and 2) classify images with additional available meta-data.
2 PAPERS โข 3 BENCHMARKS
We introduce the Oracle-MNIST dataset, comprising of 2828 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from 1) extremely serious and unique noises caused by three-thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.
The resources for this dataset can be found at https://www.openml.org/d/182
A dataset of 53 complex-valued signal modulation classes.
ACL-Fig is a large-scale automatically annotated corpus consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology. The ACL-Fig-pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.
1 PAPER โข NO BENCHMARKS YET
For a detailed description, we refer to Section 3 in our research article.
The ArtiFact dataset is a large-scale image dataset that aims to include a diverse collection of real and synthetic images from multiple categories, including Human/Human Faces, Animal/Animal Faces, Places, Vehicles, Art, and many other real-life objects. The dataset comprises 8 sources that were carefully chosen to ensure diversity and includes images synthesized from 25 distinct methods, including 13 GANs, 7 Diffusion, and 5 other miscellaneous generators. The dataset contains 2,496,738 images, comprising 964,989 real images and 1,531,749 fake images.
This database is a database of backdoored neural networks intended for face recognition. The networks are of the FaceNet architecture and are trained on Casia-WebFace, with and without additional samples (which are the source of the backdoor). More information regarding backdoors and the project within which this fits can be found in the public release of the source code : https://gitlab.idiap.ch/bob/bob.paper.backdoored_facenets.biosig2022.
The dataset contains 36000 Bangla data based on Ekman's six basic emotions. This data was first introduced in the paper Alternative non-BERT model choices for the textual classification in low-resource languages and environments. The whole dataset is balanced and evenly distributed among all the six classes.
1 PAPER โข 1 BENCHMARK
This brain tumor dataset contains 3064 T1-weighted contrast-enhanced images with three kinds of brain tumor. Detailed information on the dataset can be found in the readme file.
Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present the real results severity (BIRADS) and pathology (post-report) classifications provided by the Radiologist Director from the Radiology Department of Hospital Fernando Fonseca while diagnosing several patients (see dataset-uta4-dicom) from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of both severity (BIRADS) and pathology classifications concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these t
Original images and images with RUSTICO filters applied
The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.
Dataset included measuring static tension under 2 kg load in different points of the CB and measurements in dynamic conditions. The latter conditions presumed the range of the linear belt speeds between nu_1 = 0.5 and nu_max = 1.7 m/s. 400 Hz unified sampling frequency for the experiments. It corresponded with 140 samples.
This is a set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue. All images are 224x224 pixels (px) at 0.5 microns per pixel (MPP). For tissue classification; the classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM). The images were manually extracted from N=86 H&E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples from the NCT Biobank (National Center for Tumor Diseases, Heidelberg, Germany) and the UMM pathology archive (University Medical Center Mannheim, Mannheim, Germany). Tissue samples contained CRC primary tumor slides and tumor tissue from CRC liver metastases; normal tissue classes were augmented with non-tumorous regions from gastrectomy specimen to increase variability.
Histological images of colorectal cancer, derived from the TCGA database
CVE stands for Common Vulnerabilities and Exposures. CVE is a glossary that classifies vulnerabilities. The glossary analyzes vulnerabilities and then uses the Common Vulnerability Scoring System (CVSS) to evaluate the threat level of a vulnerability. A CVE score is often used for prioritizing the security of vulnerabilities.
CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset.
A dataset of games played in the card game "Cards Against Humanity" (CAH), by human players, derived from the online CAH labs. Each round includes the cards presented to users - a "black" prompt with a blank or question and 10 "white" punchlines as possible responses, and which punchline was picked by a player each round, along with text and metadata.
The data set covers recordings of ripening fruit with labels of destructive measurements (fruit flesh firmness, sugar content and overall ripeness). The labels are provided within three categories (firmness, sweetness and overall ripeness). Four measurement series were performed. Besides 1018 labeled recordings, the data set contains 4671 recordings without ripeness label.
DeepParliament is a legal domain Benchmark Dataset that gathers bill documents and metadata and performs various bill status classification tasks. The dataset text covers a broad range of bills from 1986 to the present and contains richer information on parliament bill content. There are a total of 5329 documents where 4223 are in the train and 1106 are in the test dataset. Each bill document contains many sentences in both cases, and the documentโs length varies greatly.
Extended Agriculture-Vision dataset comprises two parts:
We provide multiple human annotations for each test image in Fashion-MNIST. This can be used as soft labels or probabilistic labels instead of the usual hard (single) labels.
We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text.
Gambling Address Dataset is a collection of 10,423 gambling addresses that have transactions with gambling contracts. Moreover, 51,004 non-gambling addresses are also selected (such as exchanges, wallet addresses, etc.), making the gambling address dataset more complete. In the dataset, accounts are used to refer to addresses (e.g. 0xd1ce...edec95), where 1, 0, and -1 represent the gamble, non-gamble, and other types, respectively.
Gambling Contract Dataset is a collection of 260 gambling smart contracts from decentralized gambling websites, such as Dicether, Degens. At the same time, in order to construct the negative samples required for training, 1040 smart contracts that are not involved in gambling (e.g., erc20, erc721, mixer, etc.) are selected . In the dataset, accounts are used to refer to contracts (e.g. 0x3fe2b...f8a33f), where 1, 0, and -1 to represent the gamble, non-gamble, and other types, respectively.
HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.
1 PAPER โข 2 BENCHMARKS
Hephaestus is the first large-scale InSAR dataset. Driven by volcanic unrest detection, it provides 19,919 unique satellite frames annotated with a diverse set of labels. Moreover, each sample is accompanied by a textual description of its contents. The goal of this dataset is to boost research on exploitation of interferometric data enabling the application of state-of-the-art computer vision+NLP methods. Furthermore, the annotated dataset is bundled with a large archive of unlabeled frames to enable large-scale self-supervised learning. The final size of the dataset amounts to 110,573 interferograms.
The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel tasks of multimodal figurative understanding and preference.
This dataset was presented as part of the ICLR 2023 paper ๐ ๐ง๐ณ๐ข๐ฎ๐ฆ๐ธ๐ฐ๐ณ๐ฌ ๐ง๐ฐ๐ณ ๐ฃ๐ฆ๐ฏ๐ค๐ฉ๐ฎ๐ข๐ณ๐ฌ๐ช๐ฏ๐จ ๐๐ญ๐ข๐ด๐ด-๐ฐ๐ถ๐ต-๐ฐ๐ง-๐ฅ๐ช๐ด๐ต๐ณ๐ช๐ฃ๐ถ๐ต๐ช๐ฐ๐ฏ ๐ฅ๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ ๐ข๐ฏ๐ฅ ๐ช๐ต๐ด ๐ข๐ฑ๐ฑ๐ญ๐ช๐ค๐ข๐ต๐ช๐ฐ๐ฏ ๐ต๐ฐ ๐๐ฎ๐ข๐จ๐ฆ๐๐ฆ๐ต.
This is a real-worldย industrial benchmark dataset from a major medical device manufacturer for the prediction of customer escalations. The dataset contains features derived from IoT (machine log) and enterprise data including labels for escalation from a fleet of thousands of customers of high-end medical devices.ย
LEPISZCZE is an open-source comprehensive benchmark for Polish NLP and a continuous-submission leaderboard, concentrating public Polish datasets (existing and new) in specific tasks.