13 dataset results for

The Natural Questions corpus is a question answering dataset containing 307,373 training examples, 7,830 development examples, and 7,842 test examples. Each example is comprised of a google.com query and a corresponding Wikipedia page. Each Wikipedia page has a passage (or long answer) annotated on the page that answers the question and one or more short spans from the annotated passage containing the actual answer. The long and the short answer annotations can however be empty. If they are both empty, then there is no answer on the page at all. If the long answer annotation is non-empty, but the short answer annotation is empty, then the annotated passage answers the question but no explicit short answer could be found. Finally 1% of the documents have a passage annotated with a short answer that is “yes” or “no”, instead of a list of short spans.

986 PAPERS • 9 BENCHMARKS

MVTecAD (MVTEC ANOMALY DETECTION DATASET)

MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories. Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.

283 PAPERS • 4 BENCHMARKS

BIG-bench (Beyond the Imitation Game Benchmark)

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

216 PAPERS • 134 BENCHMARKS

YCB-Video

The YCB-Video dataset is a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.

146 PAPERS • 6 BENCHMARKS

BBH

BBH (BIG-Bench Hard)

BIG-Bench Hard (BBH) is a subset of the BIG-Bench, a diverse evaluation suite for language models. BBH focuses on a suite of 23 challenging tasks from BIG-Bench that were found to be beyond the capabilities of current language models. These tasks are ones where prior language model evaluations did not outperform the average human-rater.

138 PAPERS • 3 BENCHMARKS

LEVIR-CD

LEVIR-CD is a new large-scale remote sensing building Change Detection dataset. The introduced dataset would be a new benchmark for evaluating change detection (CD) algorithms, especially those based on deep learning.

81 PAPERS • 2 BENCHMARKS

Mind2Web

Dataset Summary

33 PAPERS • 1 BENCHMARK

WHU Building Dataset

We manually edited an aerial and a satellite imagery dataset of building samples and named it a WHU building dataset. The aerial dataset consists of more than 220, 000 independent buildings extracted from aerial images with 0.075 m spatial resolution and 450 km2 covering in Christchurch, New Zealand. The satellite imagery dataset consists of two subsets. One of them is collected from cities over the world and from various remote sensing resources including QuickBird, Worldview series, IKONOS, ZY-3, etc. The other satellite building sub-dataset consists of 6 neighboring satellite images covering 550 km2 on East Asia with 2.7 m ground resolution.

12 PAPERS • 3 BENCHMARKS

Echonet-Dynamic

Echocardiography, or cardiac ultrasound, is the most widely used and readily available imaging modality to assess cardiac function and structure. Combining portable instrumentation, rapid image acquisition, high temporal resolution, and without the risks of ionizing radiation, echocardiography is one of the most frequently utilized imaging studies in the United States and serves as the backbone of cardiovascular imaging. For diseases ranging from heart failure to valvular heart diseases, echocardiography is both necessary and sufﬁcient to diagnose many cardiovascular diseases. In addition to our deep learning model, we introduce a new large video dataset of echocardiograms for computer vision research. The EchoNet-Dynamic database includes 10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.

9 PAPERS • 2 BENCHMARKS

UCR Anomaly Archive

The UCR Anomaly Archive is a collection of 250 uni-variate time series collected in human medicine, biology, meteorology and industry. The collected time series contain a few natural anomalies though the majority of the anomalies are artificial . The dataset was first used in an anomaly detection contest preceding the ACM SIGKDD conference 2021. Each of the time series contains exactly one, occasionally subtle anomaly after a given time stamp. The data before that timestamp can be considered normal. The time series collected in the UCR Anomaly Archive can be categorized into 12 types originating from the four domains human medicine, meteorology, biology and industry. The distribution across the domains is highly imbalanced with around 64% of the times series being collected in human medicine applications, 22% in biology, 9% in industry and 5% being air temperature measurements. The time series within a single type (e.g. ECG) are not completely unique, but differ in terms of injected an

8 PAPERS • 1 BENCHMARK

DAVIS-DTA

Dataset Description: The interaction of 72 kinase inhibitors with 442 kinases covering >80% of the human catalytic protein kinome.

6 PAPERS • 3 BENCHMARKS

Memory Maze

Memory Maze is a 3D domain of randomized mazes designed for evaluating the long-term memory abilities of RL agents. Memory Maze isolates long-term memory from confounding challenges, such as exploration, and requires remembering several pieces of information: the positions of objects, the wall layout, and keeping track of agent’s own position.

6 PAPERS • NO BENCHMARKS YET

Massachusetts Roads Dataset (Road and Building Detection Datasets - Massachusetts Roads Dataset)

The datasets introduced in Chapter 6 of my PhD thesis are below. See the thesis for more details. If you use any of these datasets for research purposes you should use the following citation in any resulting publications:

3 PAPERS • 1 BENCHMARK