The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
894 PAPERS • 8 BENCHMARKS
The Digital Retinal Images for Vessel Extraction (DRIVE) dataset is a dataset for retinal vessel segmentation. It consists of a total of JPEG 40 color fundus images; including 7 abnormal pathology cases. The images were obtained from a diabetic retinopathy screening program in the Netherlands. The images were acquired using Canon CR5 non-mydriatic 3CCD camera with FOV equals to 45 degrees. Each image resolution is 584*565 pixels with eight bits per color channel (3 channels).
275 PAPERS • 2 BENCHMARKS
MIMIC-CXR from Massachusetts Institute of Technology presents 371,920 chest X-rays associated with 227,943 imaging studies from 65,079 patients. The studies were performed at Beth Israel Deaconess Medical Center in Boston, MA.
168 PAPERS • 2 BENCHMARKS
Kvasir-SEG is an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist.
140 PAPERS • 3 BENCHMARKS
The GENIA corpus is the primary collection of biomedical literature compiled and annotated within the scope of the GENIA project. The corpus was created to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology.
116 PAPERS • 6 BENCHMARKS
Cholec80 is an endoscopic video dataset containing 80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps and downsampled to 1 fps for processing. The whole dataset is labeled with the phase and tool presence annotations. The phases have been defined by a senior surgeon in Strasbourg hospital, France. Since the tools are sometimes hardly visible in the images and thus difficult to be recognized visually, a tool is defined as present in an image if at least half of the tool tip is visible.
97 PAPERS • 2 BENCHMARKS
VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.
70 PAPERS • 1 BENCHMARK
We introduce here a new database called UBFC-rPPG (stands for Univ. Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy) comprising two datasets that are focused specifically on rPPG analysis. The UBFC-RPPG database was created using a custom C++ application for video acquisition with a simple low cost webcam (Logitech C920 HD Pro) at 30fps with a resolution of 640x480 in uncompressed 8-bit RGB format. A CMS50E transmissive pulse oximeter was used to obtain the ground truth PPG data comprising the PPG waveform as well as the PPG heart rates. During the recording, the subject sits in front of the camera (about 1m away from the camera) with his/her face visible. All experiments are conducted indoors with a varying amount of sunlight and indoor illumination. The link to download the complete video dataset is available on request. A basic Matlab implementation can also be provided to read ground truth data acquired with a pulse oximeter.
55 PAPERS • 1 BENCHMARK
This project aims to provide all the materials to the community to resolve the problem of echocardiographic image segmentation and volume estimation from 2D ultrasound sequences (both two and four-chamber views). To this aim, the following solutions were set up.
54 PAPERS • NO BENCHMARKS YET
SLAKE is an English-Chinese bilingual dataset consisting of 642 images and 14,028 question-answer pairs for training and testing Med-VQA systems.
30 PAPERS • 1 BENCHMARK
The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.
28 PAPERS • 5 BENCHMARKS
PMC-VQA is a large-scale medical visual question-answering dataset that contains 227k VQA pairs of 149k images that cover various modalities or diseases. The question-answer pairs are generated from PMC-OA.
25 PAPERS • 3 BENCHMARKS
We present a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K, with more than 1000 (1K) CT scans from 12 medical centers, including multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation and reveal the unsolved segmentation problems of the SOTA methods, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we further build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods.
24 PAPERS • NO BENCHMARKS YET
The MMSE-HR benchmark consists of a dataset of 102 videos from 40 subjects recorded at 1040x1392 raw resolution at 25fps. During the recordings, various stimuli such as videos, sounds, and smells are introduced to induce different emotional states in the subjects. The ground truth waveform for MMSE-HR is the blood pressure signal sampled at 1000Hz. The dataset contains a diverse distribution of skin colors in the Fitzpatrick scale (II=8, III=11, IV=17, V+VI=4).
24 PAPERS • 1 BENCHMARK
MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.
23 PAPERS • NO BENCHMARKS YET
Introduced by Da et al. in DigestPath: a Benchmark Dataset with Challenge Review for the Pathological Detection and Segmentation of Digestive-System
22 PAPERS • 1 BENCHMARK
The MS-CXR dataset provides 1162 image–sentence pairs of bounding boxes and corresponding phrases, collected across eight different cardiopulmonary radiological findings, with an approximately equal number of pairs for each finding. This dataset complements the existing MIMIC-CXR v.2 dataset and comprises: 1. Reviewed and edited bounding boxes and phrases (1026 pairs of bounding box/sentence); and 2. Manual bounding box labels from scratch (136 pairs of bounding box/sentence).e
22 PAPERS • NO BENCHMARKS YET
CholecT50 is a dataset of endoscopic videos of laparoscopic cholecystectomy surgery introduced to enable research on fine-grained action recognition in laparoscopic surgery. It is annotated with triplet information in the form of <instrument, verb, target>. The dataset is a collection of 50 videos consisting of 45 videos from the Cholec80 dataset and 5 videos from an in-house dataset of the same surgical procedure.
20 PAPERS • 7 BENCHMARKS
CliCR is a new dataset for domain specific reading comprehension used to construct around 100,000 cloze queries from clinical case reports.
18 PAPERS • 1 BENCHMARK
The ECGs in this collection were obtained using a non-commercial, PTB prototype recorder with the following specifications:
16 PAPERS • 4 BENCHMARKS
Indian Diabetic Retinopathy Image Dataset (IDRiD) dataset consists of typical diabetic retinopathy lesions and normal retinal structures annotated at a pixel level. This dataset also provides information on the disease severity of diabetic retinopathy and diabetic macular edema for each image. This dataset is perfect for the development and evaluation of image analysis algorithms for early detection of diabetic retinopathy.
14 PAPERS • 3 BENCHMARKS
Consists of annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, the dataset includes ground truth masks and bounding boxes and has been verified by two expert GI endoscopists.
Chaoyang dataset contains 1111 normal, 842 serrated, 1404 adenocarcinoma, 664 adenoma, and 705 normal, 321 serrated, 840 adenocarcinoma, 273 adenoma samples for training and testing, respectively. This noisy dataset is constructed in the real scenario.
12 PAPERS • 2 BENCHMARKS
The Endomapper dataset is the first collection of complete endoscopy sequences acquired during regular medical practice, including slow and careful screening explorations, making secondary use of medical data. Its original purpose is to facilitate the development and evaluation of VSLAM (Visual Simultaneous Localization and Mapping) methods in real endoscopy data. The first release of the dataset is composed of 50 sequences with a total of more than 13 hours of video. It is also the first endoscopic dataset that includes both the computed geometric and photometric endoscope calibration as well as the original calibration videos. Meta-data and annotations associated to the dataset varies from anatomical landmark and description of the procedure labeling, tools segmentation masks, COLMAP 3D reconstructions, simulated sequences with groundtruth and meta-data related to special cases, such as sequences from the same patient. This information will improve the research in endoscopic VSLAM, a
12 PAPERS • NO BENCHMARKS YET
The SUN-SEG dataset is a high-quality per-frame annotated VPS dataset, which includes 158,690 frames from the famous SUN dataset. It extends the labels with diverse types, i.e., object mask, boundary, scribble, polygon, and visual attribute. It also introduces the pathological information from the original SUN dataset, including pathological classification labels, location information, and shape information.
12 PAPERS • 1 BENCHMARK
The Respiratory Sound database was originally compiled to support the scientific challenge organized at Int. Conf. on Biomedical Health Informatics - ICBHI 2017.
11 PAPERS • 1 BENCHMARK
The MedVidQA dataset contains the collection of 3, 010 manually created health-related questions and timestamps as visual answers to those questions from trusted video sources, such as accredited medical schools with an established reputation, health institutes, health education, and medical practitioners.
11 PAPERS • NO BENCHMARKS YET
The evaluation of human epidermal growth factor receptor 2 (HER2) expression is essential to formulate a precise treatment for breast cancer. The routine evaluation of HER2 is conducted with immunohistochemical techniques (IHC), which is very expensive. Therefore, we propose a breast cancer immunohistochemical (BCI) benchmark attempting to synthesize IHC data directly with the paired hematoxylin and eosin (HE) stained images. The dataset contains 4870 registered image pairs, covering a variety of HER2 expression levels (0, 1+, 2+, 3+).
10 PAPERS • 1 BENCHMARK
HyperKvasir dataset contains 110,079 images and 374 videos where it captures anatomical landmarks and pathological and normal findings. A total of around 1 million images and video frames altogether.
10 PAPERS • 2 BENCHMARKS
The Sixth Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing Challenge for Clinical Records focused on the temporal relations in clinical narratives. The organizers provided the research community with a corpus of discharge summaries annotated with temporal information, to be used for the development and evaluation of temporal reasoning systems. 18 teams from around the world participated in the challenge. During the workshop, participating teams presented comprehensive reviews and analysis of their systems, and outlined future research directions suggested by the challenge contributions.
9 PAPERS • 2 BENCHMARKS
4D-OR includes a total of 6734 scenes, recorded by six calibrated RGB-D Kinect sensors 1 mounted to the ceiling of the OR, with one frame-per-second, providing synchronized RGB and depth images. We provide fused point cloud sequences of entire scenes, automatically annotated human 6D poses and 3D bounding boxes for OR objects. Furthermore, we provide SSG annotations for each step of the surgery together with the clinical roles of all the humans in the scenes, e.g., nurse, head surgeon, anesthesiologist.
8 PAPERS • 1 BENCHMARK
CHAOS challenge aims the segmentation of abdominal organs (liver, kidneys and spleen) from CT and MRI data. ONsite section of the CHAOS was held in The IEEE International Symposium on Biomedical Imaging (ISBI) on April 11, 2019, Venice, ITALY. Online submissions are still welcome!
8 PAPERS • NO BENCHMARKS YET
TMED is a clinically-motivated benchmark dataset for computer vision and machine learning from limited labeled data.
Prediction of Finger Flexion IV Brain-Computer Interface Data Competition The goal of this dataset is to predict the flexion of individual fingers from signals recorded from the surface of the brain (electrocorticography (ECoG)). This data set contains brain signals from three subjects, as well as the time courses of the flexion of each of five fingers. The task in this competition is to use the provided flexion information in order to predict finger flexion for a provided test set. The performance of the classifier will be evaluated by calculating the average correlation coefficient r between actual and predicted finger flexion.
7 PAPERS • 1 BENCHMARK
There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors would review the interactions, including the evidence and the predictions, collect if necessary additional information from patients, before deciding on next steps. Despite recent progress in this area, an important piece of doctors' interactions with patients is missing in the design of these systems, namely the differential diagnosis. Its absence is largely due to the lack of datasets that include such information for models to train on. In this work, we present a large-scale synthetic dataset of roughly 1.3 million patients that includes a differential diagnosis, along with the ground truth
7 PAPERS • NO BENCHMARKS YET
The 2021 Kidney and Kidney Tumor Segmentation challenge (abbreviated KiTS21) is a competition in which teams compete to develop the best system for automatic semantic segmentation of renal tumors and surrounding anatomy.
RadQA is a radiology question answering dataset with 3074 questions posed against radiology reports and annotated with their corresponding answer spans (resulting in a total of 6148 question-answer evidence pairs) by physicians. The questions are manually created using the clinical referral section of the reports that take into account the actual information needs of ordering physicians and eliminate bias from seeing the answer context (and, further, organically create unanswerable questions). The answer spans are marked within the Findings and Impressions sections of a report. The dataset aims to satisfy the complex clinical requirements by including complete (yet concise) answer phrases (which are not just entities) that can span multiple lines.
English subset of the SLAKE dataset, comprising 642 images and more than 7,000 question–answer pairs.
ARCH is a computational pathology (CP) multiple instance captioning dataset to facilitate dense supervision of CP tasks. Existing CP datasets focus on narrow tasks; ARCH on the other hand contains dense diagnostic and morphological descriptions for a range of stains, tissue types and pathologies.
6 PAPERS • NO BENCHMARKS YET
Accurate lesion segmentation is critical in stroke rehabilitation research for the quantification of lesion burden and accurate image processing. Current automated lesion segmentation methods for T1-weighted (T1w) MRIs, commonly used in rehabilitation research, lack accuracy and reliability. Manual segmentation remains the gold standard, but it is time-consuming, subjective, and requires significant neuroanatomical expertise. However, many methods developed with ATLAS v1.2 report low accuracy, are not publicly accessible or are improperly validated, limiting their utility to the field. Here we present ATLAS v2.0 (N=1271), a larger dataset of T1w stroke MRIs and manually segmented lesion masks that includes training (public. n=655), test (masks hidden, n=300), and generalizability (completely hidden, n=316) data. Algorithm development using this larger sample should lead to more robust solutions, and the hidden test and generalizability datasets allow for unbiased performance evaluation
6 PAPERS • 1 BENCHMARK
The MISAW data set is composed of 27 sequences of micro-surgical anastomosis on artificial blood vessels performed by 3 surgeons and 3 engineering students. The dataset contained video, kinematic, and procedural descriptions synchronized at 30Hz. The procedural descriptions contained phases, steps, and activities performed by the participants.
PHM2017 is a new dataset consisting of 7,192 English tweets across six diseases and conditions: Alzheimer’s Disease, heart attack (any severity), Parkinson’s disease, cancer (any type), Depression (any severity), and Stroke. The Twitter search API was used to retrieve the data using the colloquial disease names as search keywords, with the expectation of retrieving a high-recall, low precision dataset. After removing the re-tweets and replies, the tweets were manually annotated. The labels are:
The REFLACX dataset contains eye-tracking data for 3,032 readings of chest x-rays by five radiologists. The dictated reports were transcribed and have timestamps synchronized with the eye-tracking data.
ARCADE: Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs Dataset Phase 2 consist of two folders with 300 images in each of them as well as annotations.
5 PAPERS • 2 BENCHMARKS
The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.
5 PAPERS • NO BENCHMARKS YET
Cancer in the region of the head and neck (HaN) is one of the most prominent cancers, for which radiotherapy represents an important treatment modality that aims to deliver a high radiation dose to the targeted cancerous cells while sparing the nearby healthy organs-at-risk (OARs). A precise three-dimensional spatial description, i.e. segmentation, of the target volumes as well as OARs is required for optimal radiation dose distribution calculation, which is primarily performed using computed tomography (CT) images. However, the HaN region contains many OARs that are poorly visible in CT, but better visible in magnetic resonance (MR) images. Although attempts have been made towards the segmentation of OARs from MR images, so far there has been no evaluation of the impact the combined analysis of CT and MR images has on the segmentation of OARs in the HaN region. The Head and Neck Organ-at-Risk Multi-Modal Segmentation Challenge aims to promote the development of new and application of
The RAD-ChestCT dataset is a large medical imaging dataset developed by Duke MD/PhD Rachel Draelos during her Computer Science PhD supervised by Lawrence Carin. The full dataset includes 35,747 chest CT scans from 19,661 adult patients. The public Zenodo repository contains an initial release of 3,630 chest CT scans, approximately 10% of the dataset. This dataset is of significant interest to the machine learning and medical imaging research communities.