The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.
28 PAPERS • 5 BENCHMARKS
The FlareReal600 is a nighttime flare removal dataset, which contains 650 real-captured images pairs and 500 flare images. The training set contains 600 images pairs and 500 flare images and the validation set contains 50 image pairs. Images pairs within the dataset are captured from various place (e.g., street, park, indoor) and under incorrect & correct exposure settings. Each flare-corrupted image contains various light sources. Flare images are captured from a dark room with multiple-color light sources.
0 PAPER • NO BENCHMARKS YET
AUR & UMB Anticancer Dataset This dataset contains quantitative data on the anticancer effects of the natural coumarins Auraptene (AUR) and Umbelliprenin (UMB) across 27 studies. The data were collected from published literature reporting the impacts of AUR and UMB treatment on the viability of diverse human cancer cell lines.
1 PAPER • 1 BENCHMARK
AlpacaEval in Thai.
MT-Bench in Thai.
MedLFQA is reconstructed by reformulating the current four biomedical long-form question-answering benchmark datasets: LiveQA, MedicationQA, HealthsearchQA, and K-QA. MedLFQA consists of four components: question (Q), answer (A), must-have statements (MH), and nice-to-have statements (NH). It facilitates the automatic evaluation of models' responses and provides a comprehensive understanding of how the model responds to a patient's question.
1 PAPER • NO BENCHMARKS YET
We present the SJTU Multispectral Object Detection (SMOD) dataset for detection. The dataset has 8676 infrared visible image pairs. Within this dataset, 8042 pedestrians, 10478 riders, 6501 bicycles, and 6422 cars are annotated. The degree of occlusion of all objects is meticulously annotated. The dataset with low sampling rate has dense rider and pedestrian objects and contains rich illumination variations in its 3298 pairs of images of night scenarios.
SunspotsYoloDataset is a set of 1690+190 high-resolution RGB astronomical images captured with smart telescopes with specific solar filters and annotated with the positions of sunspots that are effectively in the images. Two instruments were used for several months from Luxembourg and France between January 2023 and May 2024: a Stellina smart telescope (https://vaonis.com/stellina) and a Vespera smart telescope (https://vaonis.com/vespera).
This dataset contains named entities annotations for European Parliament recordings in Dutch, French, German and Spanish. The entity annotation scheme follows OntoNotes v5. The original unannotated dataset is VoxPopuli.
MathBench is an All in One math dataset for language model evaluation, with:
2 PAPERS • NO BENCHMARKS YET
Dataset Description
Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.
Files composing the YADL data lake, for the paper "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (Experiment, Analysis & Benchmark Paper)"
CinePile is a question-answering-based, long-form video understanding dataset. It has been created using advanced large language models (LLMs) with human-in-the-loop pipeline leveraging existing human-generated raw data. It consists of approximately 300,000 training data points and 5,000 test data points.
Simulation data and pre-trained Graph Neural Network (GNN) models produced in [1].
Dataset Summary
Dataset Details Total Labeled: 100%
Sakuga-42M is a large-scale hand-drawn cartoon video dataset for academic research purposes, it comprises 42 million cartoon keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. The dataset is intended to support researchers in their exploration of more effective and practical solutions for creating cartoons.
1 PAPER • 2 BENCHMARKS
Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.
MedConceptsQA - Open Source Medical Concepts QA Benchmark
12 PAPERS • 2 BENCHMARKS
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset.
A large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations from Reddit.
The dataset includes source code vulnerabilities in some of the most commonly used IoT frameworks. We introduce IoTvulCode- a novel framework consisting of a dataset-generating tool, and ML-enabled methods for the detection of source code vulnerabilities and weaknesses as well as the initial release of an IoT vulnerability dataset. Our framework contributes to improving the existing coding practices, leading to a more secure IoT infrastructure.
10000 instances of three-view numerical data set with 4 clusters and 2 feature components are considered. The data points in each view are generated from a 2-component 2-variate Gaussian mixture model (GMM) where their mixing proportions $\alpha_1^{(1)}=\alpha_1^{(2)}=\alpha_1^{(3)}=\alpha_1^{(4)}=0.3$; $\alpha_2^{(1)}=\alpha_2^{(2)}=\alpha_2^{(3)}=\alpha_2^{(4)}=0.15$; $\alpha_3^{(1)}=\alpha_3^{(2)}=\alpha_3^{(3)}=\alpha_3^{(4)}=0.15$ and $\alpha_4^{(1)}=\alpha_4^{(2)}=\alpha_4^{(3)}=\alpha_4^{(4)}=0.4$. The means $\mu_{ik}^{(1)}$ for the first view are $[-10 ~-5)]$,$[-9 ~ 11]$, $[0~ 6]$ and $[4~0]$; The means $\mu_{ik}^{(2)}$ for the view 2 are $[-8 ~-12]$,$[-6 ~ -3]$, $[-2~ 7]$ and $[2~1]$; And the means $\mu_{ik}^{(3)}$ for the third view are $[-5 ~-10]$,$[-8 ~ -1]$, $[0~ 5]$ and $[5~-4]$. The covariance matrices for the three views are $\Sigma_1^{(1)}=\Sigma_1^{(2)}=\Sigma_1^{(3)}=\Sigma_1^{(4)}=\left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; $\Sigma_2^{(1)}=\Si
Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Consists of 36,785 images belonging to a diverse 92 classes. This class count is significantly higher than publicly available datasets. Maintains a low-class imbalance and a highly comprehensive data distribution for robust model training. It also provides the remote sensing community with an extra platform to validate the performance on multiple benchmarks.
Vibe-Eval is a new open benchmark and framework for evaluating multimodal chat models¹². It was introduced by Reka Technologies⁴ and is designed to rigorously test these models' visual understanding capabilities⁴. Here are some key points about Vibe-Eval:
The current industrial pipeline includes 315 dynamic industrial scenarios, which can be categorized into three types: QR codes, text, and products. To enhance the diversity, we have different films with diverse material properties, coverage areas, film thicknesses, and levels of wrinkling. The film exhibits significant variability across each scenario. On the other hand, to ensure the stability of the industrial imaging pipeline, we maintained a consistent intensity level for the industrial light source and fixed the distance between the camera and the object flow. This helps to minimize the influence of errors external to the industrial system.
A Vietnamese dataset for hate speech detection by the specific target. The dataset contains 10,000 comments, each comment has 05 targets with three relevant hateful levels.
OpenStreetView-5M establishes a new open benchmark for geolocation by providing a large, open, and clean dataset. As detailed below, OpenStreetView-5M improves upon several limitations of current geolocation datasets.
Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% o
Given an English article, generate a short summary in the target language.
Given a sentence in the source language, generate a translation in the target language. The data contains translation pairs in two directions, English → TargetLanguage and TargetLanguage → English
The full version of ReefSet used in Williams et al. (2024). This dataset contains strongly labeled audio clips from coral reef habitats, taken across 16 unique datasets from 11 countries. This dataset can be used to test transfer learning performance of audio embedding models.
Given a question and passage in an Indic language, generate a short answer span from the passage as the answer.
Given a question in an Indic language and a passage in English, generate a short answer span. We provide both an English and target language answer span in the annotations.
BlendMimic3D is a pioneering synthetic dataset developed using Blender, designed to enhance Human Pose Estimation (HPE) research. This dataset features diverse scenarios including self-occlusions, object-based occlusions, and out-of-frame occlusions, tailored for the development and testing of advanced HPE models.
This dataset comprises fractured and non-fractured X-ray images covering all anatomical body regions, including lower limb, upper limb, lumbar, hips, knees, etc. The dataset is categorized into train, test, and validation folders, each containing fractured and non-fractured radiographic images.
This dataset consists of both fractured and non-fractured X-ray images encompassing various anatomical regions of the body, such as the lower limb, upper limb, lumbar region, hips, knees, and more. It is organized into three main folders: train, test, and validation, each containing both fractured and non-fractured radiographic images. You can freely access the dataset via the following link: https://www.kaggle.com/datasets/bmadushanirodrigo/fracture-multi-region-x-ray-data/data
The WRV2 dataset is meticulously assembled to support developing and evaluating video inpainting algorithms aimed specifically at wire removal. This challenging task is critical for enhancing visual aesthetics in various scenes.
GCN inference on NeuraChip accelerator on Cora dataset.
Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity. Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions. To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations. We then aggregate these DWAs to create skill profiles for institutions and academic majors. Our dataset offers a large-scale representation of college-educated workers and their role in the economy. To showcase the
The DocRED Information Extraction (DocRED-IE) dataset extends the DocRED dataset for the Document-level Closed Information Extraction (DocIE) task. DocRED-IE is a multi-task dataset and allows for 5 subtasks: (i) Document-level Relation Extraction, (ii) Mention Detection, (iii) Entity Typing, (iv) Entity Disambiguation, (v) Coreference Resolution, as well as combinations thereof such as Named Entity Recognition (NER) or Entity Linking. The DocRED-IE dataset also allows for the end-to-end tasks of: (i) DocIE and (ii) Joint Entity and Relation Extraction. DocRED-IE comprises sentence-level and document-level facts, thereby describing short as well as long-range interactions within an entire document.
1 PAPER • 6 BENCHMARKS
This is the replication package for our systematic literature review and can be used for the reproducibility of the individual steps of our search and selection methodology.
BLINK is a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations¹².