🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task

Filter by Language (clear)

2952 dataset results for English

The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.

28 PAPERS • 5 BENCHMARKS

Linguistic Benchmark

The Linguistic Benchmark (JSON), consisting of 30 questions was developed to be easy for human adults to answer but challenging for LLMs. It is designed to assess the well-documented limitations of LLMs across domains such as spatial reasoning, linguistic understanding, relational thinking, mathematical reasoning, knowledge of basic scientific concepts, and common sense. This benchmark is a useful tool to gauge the current capabilities capabilities of LLMs. The questions serve as a linguistic benchmark to examine model performance in several key domains where they have known limitations.

1 PAPER • NO BENCHMARKS YET

MoToMQA

MoToMQA (Multi-Order Theory of Mind Question & Answer)

The MoToMQA (Multi-Order Theory of Mind Question & Answer) benchmark is a test suite introduced to examine the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner¹.

1 PAPER • NO BENCHMARKS YET

AUR & UMB dataset (Anticancer Efficacy of Auraptene & Umbelliprenin: In Vitro Viability Dataset Across Human Cancer Cell Lines)

AUR & UMB Anticancer Dataset This dataset contains quantitative data on the anticancer effects of the natural coumarins Auraptene (AUR) and Umbelliprenin (UMB) across 27 studies. The data were collected from published literature reporting the impacts of AUR and UMB treatment on the viability of diverse human cancer cell lines.

1 PAPER • 1 BENCHMARK

CGNE-Snowflakes

An image sequence dataset of growing snowflakes in HDF5 format. Generated by the Gravner-Griffeath LCA model for snow crystal growth. Useful for modeling crystal growth with neural networks.

1 PAPER • NO BENCHMARKS YET

MagicBathyNet

MagicBathyNet is a benchmark dataset made up of image patches of Sentinel-2, SPOT-6 and aerial imagery, bathymetry in raster format and seabed classes annotations. Dataset also facilitates unsupervised learning for model pre-training in shallow coastal areas.

1 PAPER • NO BENCHMARKS YET

PatternCom

PatternCom is a composed image retrieval benchmark based on PatternNet. PatternNet is a large-scale high-resolution remote sensing image retrieval dataset. There are 38 classes and each class has 800 images of size 256×256 pixels. In PatternCom, we select some classes to be depicted in query images, and add a query text that defines an attribute relevant to that class. For instance, query images of “swimming pools” are combined with text queries defining “shape” as “rectangular”, “oval”, and “kidney-shaped”. In total, PatternCom includes six attributes consisted of up to four different classes each. Each attribute can be associated with two to five values per class. The number of positives ranges from 2 to 1345 and there are more than 21k queries in total.

1 PAPER • 1 BENCHMARK

Probeable Problems ICER 2024

Functionally correct (ok) and incorrect (buggy) solutions to five Probleable Problems: http://arxiv.org/abs/2405.15123 The ok solutions correspond to attempts that successfully probed all ambiguities in the given specification; the buggy solutions represent attempts that addressed these ambiguities partially A more nuanced analysis (beyond ok/buggy) of these attempts may reveal greater insights

1 PAPER • NO BENCHMARKS YET

MathBench (MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark)

MathBench is an All in One math dataset for language model evaluation, with:

3 PAPERS • NO BENCHMARKS YET

iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023

Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.

1 PAPER • NO BENCHMARKS YET

YADL

YADL (Yet Another Data LAke)

Files composing the YADL data lake, for the paper "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (Experiment, Analysis & Benchmark Paper)"

1 PAPER • NO BENCHMARKS YET

CinePile: A Long Video Question Answering Dataset and Benchmark

CinePile is a question-answering-based, long-form video understanding dataset. It has been created using advanced large language models (LLMs) with human-in-the-loop pipeline leveraging existing human-generated raw data. It consists of approximately 300,000 training data points and 5,000 test data points.

1 PAPER • 1 BENCHMARK

Dataset and Model Weights for Plasma Sheet Model Graph Network Simulator

Simulation data and pre-trained Graph Neural Network (GNN) models produced in [1].

1 PAPER • NO BENCHMARKS YET

DiscoEval

DiscoEval (Discourse Evaluation)

Dataset Summary

1 PAPER • NO BENCHMARKS YET

PanCancer Multimodal (HoneyBee)

Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

2 PAPERS • NO BENCHMARKS YET

Sakuga-42M

Sakuga-42M is a large-scale hand-drawn cartoon video dataset for academic research purposes, it comprises 42 million cartoon keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. The dataset is intended to support researchers in their exploration of more effective and practical solutions for creating cartoons.

1 PAPER • 2 BENCHMARKS

EarthVQA (A multi-modal multi-task VQA dataset for remote sensing)

Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.

1 PAPER • 1 BENCHMARK

MedConceptsQA

MedConceptsQA - Open Source Medical Concepts QA Benchmark

12 PAPERS • 2 BENCHMARKS

SoccerNet-Echoes

SoccerNet-Echoes (SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset)

SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset.

1 PAPER • NO BENCHMARKS YET

IoTvulCode

The dataset includes source code vulnerabilities in some of the most commonly used IoT frameworks. We introduce IoTvulCode- a novel framework consisting of a dataset-generating tool, and ML-enabled methods for the detection of source code vulnerabilities and weaknesses as well as the initial release of an IoT vulnerability dataset. Our framework contributes to improving the existing coding practices, leading to a more secure IoT infrastructure.

1 PAPER • NO BENCHMARKS YET

Three-view Synthetic data (Synthetic data)

10000 instances of three-view numerical data set with 4 clusters and 2 feature components are considered. The data points in each view are generated from a 2-component 2-variate Gaussian mixture model (GMM) where their mixing proportions $\alpha_1^{(1)}=\alpha_1^{(2)}=\alpha_1^{(3)}=\alpha_1^{(4)}=0.3$; $\alpha_2^{(1)}=\alpha_2^{(2)}=\alpha_2^{(3)}=\alpha_2^{(4)}=0.15$; $\alpha_3^{(1)}=\alpha_3^{(2)}=\alpha_3^{(3)}=\alpha_3^{(4)}=0.15$ and $\alpha_4^{(1)}=\alpha_4^{(2)}=\alpha_4^{(3)}=\alpha_4^{(4)}=0.4$. The means $\mu_{ik}^{(1)}$ for the first view are $[-10 ~-5)]$,$[-9 ~ 11]$, $[0~ 6]$ and $[4~0]$; The means $\mu_{ik}^{(2)}$ for the view 2 are $[-8 ~-12]$,$[-6 ~ -3]$, $[-2~ 7]$ and $[2~1]$; And the means $\mu_{ik}^{(3)}$ for the third view are $[-5 ~-10]$,$[-8 ~ -1]$, $[0~ 5]$ and $[5~-4]$. The covariance matrices for the three views are $\Sigma_1^{(1)}=\Sigma_1^{(2)}=\Sigma_1^{(3)}=\Sigma_1^{(4)}=\left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; $\Sigma_2^{(1)}=\Si

1 PAPER • NO BENCHMARKS YET

IIW-400

IIW-400 (ImageInWords: IIW-400)

Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md

1 PAPER • NO BENCHMARKS YET

Replication Data for: "DAM"

Replication Data for: "DAM" (Replication Data for: "DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting")

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

Polarized Film Removal Dataset

The current industrial pipeline includes 315 dynamic industrial scenarios, which can be categorized into three types: QR codes, text, and products. To enhance the diversity, we have different films with diverse material properties, coverage areas, film thicknesses, and levels of wrinkling. The film exhibits significant variability across each scenario. On the other hand, to ensure the stability of the industrial imaging pipeline, we maintained a consistent intensity level for the industrial light source and fixed the distance between the camera and the object flow. This helps to minimize the influence of errors external to the industrial system.

1 PAPER • NO BENCHMARKS YET

PECC

PECC (PECC: Problem Extraction and Coding Challenges)

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% o

1 PAPER • 1 BENCHMARK

BlendMimic3D (A Synthetic Dataset for Human Pose Estimation)

BlendMimic3D is a pioneering synthetic dataset developed using Blender, designed to enhance Human Pose Estimation (HPE) research. This dataset features diverse scenarios including self-occlusions, object-based occlusions, and out-of-frame occlusions, tailored for the development and testing of advanced HPE models.

1 PAPER • NO BENCHMARKS YET

DocRED-IE

The DocRED Information Extraction (DocRED-IE) dataset extends the DocRED dataset for the Document-level Closed Information Extraction (DocIE) task. DocRED-IE is a multi-task dataset and allows for 5 subtasks: (i) Document-level Relation Extraction, (ii) Mention Detection, (iii) Entity Typing, (iv) Entity Disambiguation, (v) Coreference Resolution, as well as combinations thereof such as Named Entity Recognition (NER) or Entity Linking. The DocRED-IE dataset also allows for the end-to-end tasks of: (i) DocIE and (ii) Joint Entity and Relation Extraction. DocRED-IE comprises sentence-level and document-level facts, thereby describing short as well as long-range interactions within an entire document.

1 PAPER • 6 BENCHMARKS

Replication Package: Migrating Software Systems towards Post-Quantum-Cryptography - A Systematic Literature Review

This is the replication package for our systematic literature review and can be used for the reproducibility of the individual steps of our search and selection methodology.

1 PAPER • NO BENCHMARKS YET

SurgeGlobal/Evol-Instruct

Dataset Generation

1 PAPER • NO BENCHMARKS YET

SurgeGlobal/LaMini

Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

1 PAPER • NO BENCHMARKS YET

SurgeGlobal/Orca

Dataset Generation

1 PAPER • NO BENCHMARKS YET

VBR (VBR: A Vision Benchmark in Rome)

This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi

2 PAPERS • NO BENCHMARKS YET

HRI Simple Tasks

The dataset concerns toy tasks that a human should teach to a robot. The number of task repetitions is limited in the dataset since the human should demonstrate the task to the robot only a few times.

1 PAPER • NO BENCHMARKS YET

Labels

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

UruDendro (UruDendro, a public dataset of cross-section images of pinus taeda)

UruDendro is a database of wood cross section images of commercially grown Pinus taeda trees from northern Uruguay. It is form by 64 RGB wood images, their rings delineations and pith location.

3 PAPERS • NO BENCHMARKS YET

Audio-alpaca

Audio-alpaca: A preference dataset for aligning text-to-audio models Audio-alpaca is a pairwise preference dataset containing about 15k (prompt,chosen, rejected) triplets where given a textual prompt, chosen is the preferred generated audio and rejected is the undesirable audio.

1 PAPER • NO BENCHMARKS YET

MMCode

MMCode is a multi-modal code generation dataset designed to evaluate the problem-solving skills of code language models in visually rich contexts (i.e. images). It contains 3,548 questions paired with 6,620 images, derived from real-world programming challenges across 10 code competition websites, with Python solutions and tests provided. The dataset emphasizes the extreme demand for reasoning abilities, the interwoven nature of textual and visual contents, and the occurrence of questions containing multiple images.

1 PAPER • NO BENCHMARKS YET

Vript (🎬 Vript: Refine Video Captioning into Video Scripting)

We construct a fine-grained video-text dataset with 12K annotated high-resolution videos (~400k clips). The annotation of this dataset is inspired by the video script. If we want to make a video, we have to first write a script to organize how to shoot the scenes in the videos. To shoot a scene, we need to decide the content, shot type (medium shot, close-up, etc), and how the camera moves (panning, tilting, etc). Therefore, we extend video captioning to video scripting by annotating the videos in the format of video scripts. Different from the previous video-text datasets, we densely annotate the entire videos without discarding any scenes and each scene has a caption with ~145 words. Besides the vision modality, we transcribe the voice-over into text and put it along with the video title to give more background information for annotating the videos.

0 PAPER • NO BENCHMARKS YET

OpenTrench3D

OpenTrench3D, the first publicly available point cloud dataset of underground utilities from open trenches. It features 310 fully annotated point clouds consisting of a total of 528 million points categorised into 5 unique classes. OpenTrench3D consists of photogrammetrically derived 3D point clouds capturing detailed scenes of open trenches, revealing underground utilities.

3 PAPERS • 1 BENCHMARK

AVS Benchmark

AVS Benchmark (Audio-Visual Synchrony Benchmark)

Provided in the linked paper.

1 PAPER • NO BENCHMARKS YET

Drag100

The Drag100 dataset is introduced in the paper "GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models"¹. This dataset is a new contribution to the benchmarking of drag editing¹.

1 PAPER • NO BENCHMARKS YET

Data on: Cell Signaling and Targeted Drug Therapy of Cancer

The landmark Cancer Genomics Program launched in 2006 has contributed immensely to the awareness of the importance of cancer genomics in our understanding of cancer over the past decade and has begun to change the way the disease is treated in clinic. A large number of mutations contribute to cancer and predicting the effects of mutations using in silico tools has become a frequently used approach, but the use of next-generation sequencing-based approaches in clinical diagnosis has also led to a considerable increase in data and a vast number of variants of uncertain significance that require further analysis and validation to achieve the development goals. These data cannot be analyzed simply by using the tools and techniques traditionally available to better understand the origin and evolution of cancer and therefore to achieve this goal, a cancer reference framework through modeling of genome sequencing data has been proposed for the systematic identification of representative drive

0 PAPER • NO BENCHMARKS YET

ChronoMagic

ChronoMagic with 2265 metamorphic time-lapse videos, each accompanied by a detailed caption.

1 PAPER • NO BENCHMARKS YET

BEAR-probe (Benchmark for Evaluating Associative Reasoning)

The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.

1 PAPER • 1 BENCHMARK

MultiSenseBadminton (MultiSenseBadminton: Wearable Sensor–Based Biomechanical Dataset for Evaluation of Badminton Performance)

The sports industry is witnessing an increasing trend of utilizing multiple synchronized sensors for player data collection, enabling personalized training systems with multi-perspective real-time feedback. Badminton could benefit from these various sensors, but there is a scarcity of comprehensive badminton action datasets for analysis and training feedback. Addressing this gap, this paper introduces a multi-sensor badminton dataset for forehand clear and backhand drive strokes, based on interviews with coaches for optimal usability. The dataset covers various skill levels, including beginners, intermediates, and experts, providing resources for understanding biomechanics across skill levels. It encompasses 7,763 badminton swing data from 25 players, featuring sensor data on eye tracking, body tracking, muscle signals, and foot pressure. The dataset also includes video recordings, detailed annotations on stroke type, skill level, sound, ball landing, and hitting location, as well as s

1 PAPER • NO BENCHMARKS YET

NES-VMDB

NES-VMDB is a dataset containing 98,940 gameplay videos from 389 NES games, each paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is built upon the Nintendo Entertainment System Music Database (NES-MDB), encompassing 5,278 music pieces from 397 NES games.

1 PAPER • NO BENCHMARKS YET

3D design files

3D design files (3D design Files for Stickbug Robot)

3D design file repository for the Stickbug Robot a 6 armed holonomic precision pollination robot

1 PAPER • NO BENCHMARKS YET

Bramble flower image dataset

Bramble flower image dataset (BRAMBLE FLOWER DETECTION AND CLASSIFICATION DATASET FOR PRECISION POLLINATION)

This dataset contains both the artificial and real flower images of bramble flowers. The real images were taken with a realsense D435 camera inside the West Virginia University greenhouse. All the flowers are annotated in YOLO format with bounding box and class name. The trained weights after training also have been provided. They can be used with the python script provided to detect the bramble flowers. Also the classifier can classify whether the flowers center is visible or hidden which will be helpful in precision pollination projects. Images are also augmented to make the task robust in various environmental conditions.

1 PAPER • NO BENCHMARKS YET

Datasets

2952 dataset results for English