🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task

Filter by Language

9888 dataset results

The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.

28 PAPERS • 5 BENCHMARKS

FlareReal600

The FlareReal600 is a nighttime flare removal dataset, which contains 650 real-captured images pairs and 500 flare images. The training set contains 600 images pairs and 500 flare images and the validation set contains 50 image pairs. Images pairs within the dataset are captured from various place (e.g., street, park, indoor) and under incorrect & correct exposure settings. Each flare-corrupted image contains various light sources. Flare images are captured from a dark room with multiple-color light sources.

0 PAPER • NO BENCHMARKS YET

AUR & UMB dataset

AUR & UMB dataset (Anticancer Efficacy of Auraptene & Umbelliprenin: In Vitro Viability Dataset Across Human Cancer Cell Lines)

AUR & UMB Anticancer Dataset This dataset contains quantitative data on the anticancer effects of the natural coumarins Auraptene (AUR) and Umbelliprenin (UMB) across 27 studies. The data were collected from published literature reporting the impacts of AUR and UMB treatment on the viability of diverse human cancer cell lines.

1 PAPER • 1 BENCHMARK

AlpacaEval-TH

AlpacaEval in Thai.

0 PAPER • NO BENCHMARKS YET

MT-Bench-TH

MT-Bench in Thai.

0 PAPER • NO BENCHMARKS YET

MedLFQA

MedLFQA (Medical Long-form Question Answering)

MedLFQA is reconstructed by reformulating the current four biomedical long-form question-answering benchmark datasets: LiveQA, MedicationQA, HealthsearchQA, and K-QA. MedLFQA consists of four components: question (Q), answer (A), must-have statements (MH), and nice-to-have statements (NH). It facilitates the automatic evaluation of models' responses and provides a comprehensive understanding of how the model responds to a patient's question.

1 PAPER • NO BENCHMARKS YET

SJTU Multispectral Object Detection (SMOD) Dataset

We present the SJTU Multispectral Object Detection (SMOD) dataset for detection. The dataset has 8676 infrared visible image pairs. Within this dataset, 8042 pedestrians, 10478 riders, 6501 bicycles, and 6422 cars are annotated. The degree of occlusion of all objects is meticulously annotated. The dataset with low sampling rate has dense rider and pedestrian objects and contains rich illumination variations in its 3298 pairs of images of night scenarios.

1 PAPER • NO BENCHMARKS YET

SunspotsYoloDataset: annotated solar images captured with smart telescopes (January 2023 - May 2024)

SunspotsYoloDataset is a set of 1690+190 high-resolution RGB astronomical images captured with smart telescopes with specific solar filters and annotated with the positions of sunspots that are effectively in the images. Two instruments were used for several months from Luxembourg and France between January 2023 and May 2024: a Stellina smart telescope (https://vaonis.com/stellina) and a Vespera smart telescope (https://vaonis.com/vespera).

0 PAPER • NO BENCHMARKS YET

MSNER

MSNER (Multilingual Spoken Named Entity Recognition)

This dataset contains named entities annotations for European Parliament recordings in Dutch, French, German and Spanish. The entity annotation scheme follows OntoNotes v5. The original unannotated dataset is VoxPopuli.

1 PAPER • NO BENCHMARKS YET

MathBench (MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark)

MathBench is an All in One math dataset for language model evaluation, with:

2 PAPERS • NO BENCHMARKS YET

Homophobia Detection Dataset (Twitter/X)

Dataset Description

1 PAPER • NO BENCHMARKS YET

iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023

Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.

1 PAPER • NO BENCHMARKS YET

YADL

YADL (Yet Another Data LAke)

Files composing the YADL data lake, for the paper "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (Experiment, Analysis & Benchmark Paper)"

1 PAPER • NO BENCHMARKS YET

CinePile: A Long Video Question Answering Dataset and Benchmark

CinePile is a question-answering-based, long-form video understanding dataset. It has been created using advanced large language models (LLMs) with human-in-the-loop pipeline leveraging existing human-generated raw data. It consists of approximately 300,000 training data points and 5,000 test data points.

1 PAPER • 1 BENCHMARK

Dataset and Model Weights for Plasma Sheet Model Graph Network Simulator

Simulation data and pre-trained Graph Neural Network (GNN) models produced in [1].

1 PAPER • NO BENCHMARKS YET

DiscoEval

DiscoEval (Discourse Evaluation)

Dataset Summary

1 PAPER • NO BENCHMARKS YET

Reglamento_Aeronautico_Colombiano_2024

Dataset Details Total Labeled: 100%

0 PAPER • NO BENCHMARKS YET

Sakuga-42M

Sakuga-42M is a large-scale hand-drawn cartoon video dataset for academic research purposes, it comprises 42 million cartoon keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. The dataset is intended to support researchers in their exploration of more effective and practical solutions for creating cartoons.

1 PAPER • 2 BENCHMARKS

EarthVQA (A multi-modal multi-task VQA dataset for remote sensing)

Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.

1 PAPER • 1 BENCHMARK

MedConceptsQA

MedConceptsQA - Open Source Medical Concepts QA Benchmark

12 PAPERS • 2 BENCHMARKS

SoccerNet-Echoes

SoccerNet-Echoes (SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset)

SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset.

1 PAPER • NO BENCHMARKS YET

ProCIS

A large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations from Reddit.

1 PAPER • NO BENCHMARKS YET

IoTvulCode

The dataset includes source code vulnerabilities in some of the most commonly used IoT frameworks. We introduce IoTvulCode- a novel framework consisting of a dataset-generating tool, and ML-enabled methods for the detection of source code vulnerabilities and weaknesses as well as the initial release of an IoT vulnerability dataset. Our framework contributes to improving the existing coding practices, leading to a more secure IoT infrastructure.

1 PAPER • NO BENCHMARKS YET

Three-view Synthetic data (Synthetic data)

10000 instances of three-view numerical data set with 4 clusters and 2 feature components are considered. The data points in each view are generated from a 2-component 2-variate Gaussian mixture model (GMM) where their mixing proportions $\alpha_1^{(1)}=\alpha_1^{(2)}=\alpha_1^{(3)}=\alpha_1^{(4)}=0.3$; $\alpha_2^{(1)}=\alpha_2^{(2)}=\alpha_2^{(3)}=\alpha_2^{(4)}=0.15$; $\alpha_3^{(1)}=\alpha_3^{(2)}=\alpha_3^{(3)}=\alpha_3^{(4)}=0.15$ and $\alpha_4^{(1)}=\alpha_4^{(2)}=\alpha_4^{(3)}=\alpha_4^{(4)}=0.4$. The means $\mu_{ik}^{(1)}$ for the first view are $[-10 ~-5)]$,$[-9 ~ 11]$, $[0~ 6]$ and $[4~0]$; The means $\mu_{ik}^{(2)}$ for the view 2 are $[-8 ~-12]$,$[-6 ~ -3]$, $[-2~ 7]$ and $[2~1]$; And the means $\mu_{ik}^{(3)}$ for the third view are $[-5 ~-10]$,$[-8 ~ -1]$, $[0~ 5]$ and $[5~-4]$. The covariance matrices for the three views are $\Sigma_1^{(1)}=\Sigma_1^{(2)}=\Sigma_1^{(3)}=\Sigma_1^{(4)}=\left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; $\Sigma_2^{(1)}=\Si

1 PAPER • NO BENCHMARKS YET

IIW-400

IIW-400 (ImageInWords: IIW-400)

Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md

1 PAPER • NO BENCHMARKS YET

PerezGaldos

PerezGaldos (Single Spanish Speaker Dataset)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

RS_NS92 (Remote Sensing Natural Scenes 92 (classes))

Consists of 36,785 images belonging to a diverse 92 classes. This class count is significantly higher than publicly available datasets. Maintains a low-class imbalance and a highly comprehensive data distribution for robust model training. It also provides the remote sensing community with an extra platform to validate the performance on multiple benchmarks.

0 PAPER • NO BENCHMARKS YET

Vibe-Eval

Vibe-Eval is a new open benchmark and framework for evaluating multimodal chat models¹². It was introduced by Reka Technologies⁴ and is designed to rigorously test these models' visual understanding capabilities⁴. Here are some key points about Vibe-Eval:

1 PAPER • NO BENCHMARKS YET

Replication Data for: "DAM"

Replication Data for: "DAM" (Replication Data for: "DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting")

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

Polarized Film Removal Dataset

The current industrial pipeline includes 315 dynamic industrial scenarios, which can be categorized into three types: QR codes, text, and products. To enhance the diversity, we have different films with diverse material properties, coverage areas, film thicknesses, and levels of wrinkling. The film exhibits significant variability across each scenario. On the other hand, to ensure the stability of the industrial imaging pipeline, we maintained a consistent intensity level for the industrial light source and fixed the distance between the camera and the object flow. This helps to minimize the influence of errors external to the industrial system.

1 PAPER • NO BENCHMARKS YET

ViTHSD

ViTHSD (Vietnamese Targeted-Hate-Speech-Detection)

A Vietnamese dataset for hate speech detection by the specific target. The dataset contains 10,000 comments, each comment has 05 targets with three relevant hateful levels.

1 PAPER • NO BENCHMARKS YET

OpenStreetView-5M

OpenStreetView-5M establishes a new open benchmark for geolocation by providing a large, open, and clean dataset. As detailed below, OpenStreetView-5M improves upon several limitations of current geolocation datasets.

1 PAPER • 1 BENCHMARK

PECC

PECC (PECC: Problem Extraction and Coding Challenges)

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% o

1 PAPER • 1 BENCHMARK

Trust Dynamics and Market Behavior in Cryptocurrency

Trust Dynamics and Market Behavior in Cryptocurrency (Trust Dynamics and Market Behavior in Cryptocurrency: A Comparative Study of Centralized and Decentralized Exchanges)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

CrossSum-IN

Given an English article, generate a short summary in the target language.

1 PAPER • NO BENCHMARKS YET

Flores-IN

Given a sentence in the source language, generate a translation in the target language. The data contains translation pairs in two directions, English → TargetLanguage and TargetLanguage → English

1 PAPER • NO BENCHMARKS YET

ReefSet

The full version of ReefSet used in Williams et al. (2024). This dataset contains strongly labeled audio clips from coral reef habitats, taken across 16 unique datasets from 11 countries. This dataset can be used to test transfer learning performance of audio embedding models.

1 PAPER • NO BENCHMARKS YET

XQuAD-IN

Given a question and passage in an Indic language, generate a short answer span from the passage as the answer.

1 PAPER • NO BENCHMARKS YET

XorQA-IN:

Given a question in an Indic language and a passage in English, generate a short answer span. We provide both an English and target language answer span in the annotations.

1 PAPER • NO BENCHMARKS YET

BlendMimic3D (A Synthetic Dataset for Human Pose Estimation)

BlendMimic3D is a pioneering synthetic dataset developed using Blender, designed to enhance Human Pose Estimation (HPE) research. This dataset features diverse scenarios including self-occlusions, object-based occlusions, and out-of-frame occlusions, tailored for the development and testing of advanced HPE models.

1 PAPER • NO BENCHMARKS YET

Bone Fracture Multi-Region X-ray Data

This dataset comprises fractured and non-fractured X-ray images covering all anatomical body regions, including lower limb, upper limb, lumbar, hips, knees, etc. The dataset is categorized into train, test, and validation folders, each containing fractured and non-fractured radiographic images.

0 PAPER • NO BENCHMARKS YET

Bone Fracture Multi-Region X-ray Dataset

This dataset consists of both fractured and non-fractured X-ray images encompassing various anatomical regions of the body, such as the lower limb, upper limb, lumbar region, hips, knees, and more. It is organized into three main folders: train, test, and validation, each containing both fractured and non-fractured radiographic images. You can freely access the dataset via the following link: https://www.kaggle.com/datasets/bmadushanirodrigo/fracture-multi-region-x-ray-data/data

0 PAPER • NO BENCHMARKS YET

WRV2 (Wire Removal Video Datasets 2)

The WRV2 dataset is meticulously assembled to support developing and evaluating video inpainting algorithms aimed specifically at wire removal. This challenging task is critical for enhancing visual aesthetics in various scenes.

1 PAPER • NO BENCHMARKS YET

GCN on CORA Dataset

GCN on CORA Dataset (NeuraChip CORA Experiment)

GCN inference on NeuraChip accelerator on Cora dataset.

1 PAPER • NO BENCHMARKS YET

A national longitudinal dataset of skills taught in U.S. higher education curricula

Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity. Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions. To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations. We then aggregate these DWAs to create skill profiles for institutions and academic majors. Our dataset offers a large-scale representation of college-educated workers and their role in the economy. To showcase the

1 PAPER • NO BENCHMARKS YET

DocRED-IE

The DocRED Information Extraction (DocRED-IE) dataset extends the DocRED dataset for the Document-level Closed Information Extraction (DocIE) task. DocRED-IE is a multi-task dataset and allows for 5 subtasks: (i) Document-level Relation Extraction, (ii) Mention Detection, (iii) Entity Typing, (iv) Entity Disambiguation, (v) Coreference Resolution, as well as combinations thereof such as Named Entity Recognition (NER) or Entity Linking. The DocRED-IE dataset also allows for the end-to-end tasks of: (i) DocIE and (ii) Joint Entity and Relation Extraction. DocRED-IE comprises sentence-level and document-level facts, thereby describing short as well as long-range interactions within an entire document.

1 PAPER • 6 BENCHMARKS

Replication Package: Migrating Software Systems towards Post-Quantum-Cryptography - A Systematic Literature Review

This is the replication package for our systematic literature review and can be used for the reproducibility of the individual steps of our search and selection methodology.

1 PAPER • NO BENCHMARKS YET

BlINK

BLINK is a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations¹².

2 PAPERS • NO BENCHMARKS YET