The ATIS (Airline Travel Information Systems) is a dataset consisting of audio recordings and corresponding manual transcripts about humans asking for flight information on automated airline travel inquiry systems. The data consists of 17 unique intent categories. The original split contains 4478, 500 and 893 intent-labeled reference utterances in train, development and test set respectively.
264 PAPERS • 7 BENCHMARKS
The SNIPS Natural Language Understanding benchmark is a dataset of over 16,000 crowdsourced queries distributed among 7 user intents of various complexity:
245 PAPERS • 6 BENCHMARKS
ImageNet-O consists of images from classes that are not found in the ImageNet-1k dataset. It is used to test the robustness of vision models to out-of-distribution samples. It's reported using the AUPR metric.
76 PAPERS • NO BENCHMARKS YET
It is manually annotated, comes with a naturally diverse distribution, and has a large scale. It is built to overcome several shortcomings of existing OOD benchmarks. OpenImage-O is image-by-image filtered from the test set of OpenImage-V3, which has been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias.
21 PAPERS • NO BENCHMARKS YET
The MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 10,413 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects, and annotations for semantic segmentation, depth estimation, object, and instance detection. Predictive uncertainty estimation is essential for the safe deployment of Deep Neural Networks in real-world autonomous systems and MUAD allows to a better assess the impact of different sources of uncertainty on model performance.
3 PAPERS • NO BENCHMARKS YET
The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .
2 PAPERS • 1 BENCHMARK
The PATIS is a Persian language dataset for intent detection and slot filling.
2 PAPERS • 2 BENCHMARKS
The FathomNet2023 competition dataset is a subset of the broader FathomNet marine image repository. The training and test images for the competition were all collected in the Monterey Bay Area between the surface and 1300 meters depth by the Monterey Bay Aquarium Research Institute. The images contain bounding box annotations of 290 categories of bottom dwelling animals. The training and validation data are split across an 800 meter depth threshold: all training data is collected from 0-800 meters, evaluation data comes from the whole 0-1300 meter range. Since an organisms' habitat range is partially a function of depth, the species distributions in the two regions are overlapping but not identical. Test images are drawn from the same region but may come from above or below the depth horizon. The competition goal is to label the animals present in a given image (i.e. multi-label classification) and determine whether the image is out-of-sample.
1 PAPER • NO BENCHMARKS YET
In this dataset, various objects are arranged on a white table. A UR5e robot picks and place a target object specified on the title of the video/image sequence. Videos under auto- folder are collected with automatic operation of the robot. Videos under human- folders are collected with the tele-operation of the robot. Ground-truth tracking bounding boxes are generated with STARK, and when the target exits the camera frame, the bounding box estimation is switched to [-1, -1, -1, -1], indicating target not shown.