A dataset of 100K synthetic images of skin lesions, ground-truth (GT) segmentations of lesions and healthy skin, GT segmentations of seven body parts (head, torso, hips, legs, feet, arms and hands), and GT binary masks of non-skin regions in the texture maps of 215 scans from the 3DBodyTex.v1 dataset [2], [3] created using the framework described in [1]. The dataset is primarily intended to enable the development of skin lesion analysis methods. Synthetic image creation consisted of two main steps. First, skin lesions from the Fitzpatrick 17k dataset were blended onto skin regions of high-resolution three-dimensional human scans from the 3DBodyTex dataset [2], [3]. Second, two-dimensional renders of the modified scans were generated.
1 PAPER • NO BENCHMARKS YET
Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. To tackle this issue with a common benchmark, we introduce the Drunkard’s Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality.
1 PAPER • 1 BENCHMARK
EBHI-Seg is a dataset containing 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer.
The dataset X of this work is an extension of the heartSeg dataset. Each sample x ∈ X is an RGB image capturing the heart region of Medaka (Oryzias latipes) hatchlings from a constant ventral view. Since the body of Medaka is see-through, noninvasive studies regarding the internal organs and the whole circulatory system are practicable. A Medaka’s heart contains three parts: the atrium, the ventricle, and the bulbus. The atrium receives deoxygenated blood from the circulatory system and delivers it to the ventricle, which forwards it into the bulbus. The bulbus is the heart’s exit chamber and provides the gill arches with a constant blood flow. The blood flow through these three chambers was captured in 63 short recordings (around 11 seconds with 24 frames per second each) in total, from which the single image samples x ∈ X are extracted. The dataset is split into training and test data following the heartSeg dataset with ntrain = 565 samples in the training set Xtrain and ntest = 165
Facial Skeletal Angles (Glabella and Maxilla Angle and Length and Width of Piriformis)
The Fraunhofer Portugal AICOS EDoF Dataset was produced within the TAMI project and is composed of images of microscopic fields of view (FOV) of Liquid-based Cervical Cytology (LBC) samples. A total of 15 LBC samples were supplied by the Pathology Services from Hospital Fernando Fonseca and the Portuguese Oncology Institute of Porto. For each LBC sample, a set of images were obtained using a version of µSmartScope [1,2] prototype adapted to the cervical cytology use case [3,4].
Human fibrosarcoma HT1080WT (ATCC) cells at low cell densities embedded in 3D collagen type I matrices [1]. The time-lapse videos were recorded every 2 minutes for 16.7 hours and covered a field of view of 1002 pixels × 1004 pixels with a pixel size of 0.802 μm/pixel The videos were pre-processed to correct frame-to-frame drift artifacts, resulting in a final size of 983 pixels × 985 pixels pixels.
At the Isfahan Fertility and Infertility Center, semen samples were collected from fifteen patients. The sperm samples were fixed and stained using the Diff-Quick method. Using an Olympus CX21 microscope with a ×100 objective lens and a ×10 eyepiece and a Sony color camera (Model No SSC-DC58AP), 725 images were taken. The resolution of each image was 576×720 pixels. From these images, the sperm heads were cropped and classified into five classes by three specialists. The classes are Normal, Pyriform, Tapered, Amorphous, and Others. After the classification, only the samples which there was a collective consensus about their class were kept in the dataset. Four classes of Normal, Pyriform, Tapered, and Amorphous are included in this dataset. The resulting dataset of sperm heads denoted as Human Sperm Head Morphology dataset (HuSHeM) consists of four folders, each corresponding to a specific set of sperm shapes. The folder names reflect the shape of the contained images. There are 54
This dataset includes sharp-blur pairs of Leishmania image, which is a protozoan parasite microscopy image dataset of Leishmania, obtained from the preserved slides stained with Giemsa. The paired blur-sharp images are acquired by employing a bright-field microscope (Olympus IX53) with 100× magnification oil immersion objectives.We first capture the sharp images as ground truth, then acquire its corresponding out-of-focus images. The extent and nature of defocusing are random along the optical axis, where the degree of out-of-focus is inconsistent from image-to-image. This dataset includes 764 in-focus and 764 corresponding out-of-focus images, where each image is composed of 2304 × 1728 pixels in 24-bit JPG format.
The MIMIC PERform Testing dataset contains the following physiological signals recorded from 200 critically-ill patients during routine clinical care:
1 PAPER • 2 BENCHMARKS
MODA is a large open-source dataset of high quality, human-scored sleep spindles (5342 spindles, from 180 subjects) that was produced by the Massive Online Data Annotation project. Sleep spindles were detected as a consensus of a number of human-expert scorers. With a median number of 5 experts scoring every EEG segment, MODA offers sleep spindle annotations of a quality unseen in previous datasets.
1、 Competition name:
Our primary objective in creating this dataset is to support researchers in the advancement of algorithms for keypoints detection and the pretraining of large models on retinal images using a self-supervised approach. The keypoints in the dataset have been carefully annotated by students from our lab, ensuring meticulous accuracy.
The MedVidCL dataset contains a collection of 6, 617 videos annotated into ‘medical instructional’, ‘medical non-instructional' and ‘non-medical’ classes. A two-step approach is used to construct the MedVidCL dataset. In the first step, the videos annotated by health informatics experts are used to train a machine learning model that predicts the given video to one of the three aforementioned classes. In the second step, only the high-confidence videos are used and health informatics experts assess the model’s predicted video category and update the category wherever needed.
Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the original webpage)
The NCANDA consortium is composed of an Administrative component at the University of California San Diego, a Data Analysis and Informatics component at SRI International, and five research sites (University of California San Diego, SRI International, Duke University, the University of Pittsburgh, and the Oregon Health & Science University). A sample of 831 individuals (ages 12-21) were recruited for the study across the five research sites. The enrolled participants are followed in an accelerated longitudinal design that involves structural and functional imaging of the brain along with extensive neuropsychological and clinical assessments.
Unique radiogenomic dataset from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects. The dataset comprises Computed Tomography (CT), Positron Emission Tomography (PET)/CT images, semantic annotations of the tumors as observed on the medical images using a controlled vocabulary, segmentation maps of tumors in the CT scans, and quantitative values obtained from the PET/CT scans. Imaging data are also paired with gene mutation, RNA sequencing data from samples of surgically excised tumor tissue, and clinical data, including survival outcomes.
This collection contains images from 422 non-small cell lung cancer (NSCLC) patients. For these patients pretreatment CT scans, manual delineation by a radiation oncologist of the 3D volume of the gross tumor volume and clinical outcome data are available.
Onchocerciasis is causing blindness in over half a million people in the world today. Drug development for the disease is crippled as there is no way of measuring effectiveness of the drug without an invasive procedure. Drug efficacy measurement through assessment of viability of onchocerca worms requires the patients to undergo nodulectomy which is invasive, expensive, time-consuming, skill-dependent, infrastructure dependent and lengthy process.
Abstract The Norwegian Endurance Athlete ECG Database contains 12-lead ECG recordings from 28 elite athletes from various sports in Norway. All recordings are 10 seconds resting ECGs recorded with a General Electric (GE) MAC VUE 360 electrocardiograph. All ECGs are interpreted with both the GE Marquette SL12 algorithm (version 23 (v243)) and one cardiologist with training in interpretation of athlete's ECG. The data was collected at the University of Oslo in February and March 2020.
Authors of the Dataset:
The PART-OF dataset is a dataset of relations extracted from a medical ontology. The different entities in the ontology are parts of the human body. The dataset has 16,894 nodes with 19,436 edges between them.
Overview The Surgical Instruments Recognition Dataset is a groundbreaking collection of high-resolution images (1280x960 pixels) specifically designed for the recognition and categorization of surgical instruments. This dataset captures the intricate details and complexity of surgical tools, particularly when arranged in scenarios reminiscent of an operating room.
A total of 227 cross sectional images (20 x 54 mm with a resolution of 289 x 648 pixels) of hind-leg xenograft tumors from 29 mice were obtained with 1mm step-wise movement of the array mounted on a manual positioning device. The whole tumor volume was acquired using a diagnostic ultrasound system with a 10 MHz linear transducer and 50 MHz sampling.
Dataset of sperm head images with expert-classification labels. The dataset contains 1854 sperm head images obtained from six semen smears and classified by three Chilean referent domain experts according to World Health Organization (WHO) criteria, in one of the following classes: normal, tapered, pyriform, small and amorphous. This gold-standard is aimed for use in evaluating and comparing not only known techniques, but also future improvements to present approaches for classification of human sperm heads for semen analysis.
This dataset is obtained during an ICON project (2017-2018) in collaboration with KU Leuven (ESAT-STADIUS), UZ Leuven, UCB, Byteflies and Pilipili. The goal of this project was to design a system using Behind the ear (bhE) EEG electrodes for monitoring the patient in a home environment. This way, a nice balance can be found between sufficient accuracy of seizure detection algorithms (because EEG is used) and wearability (bhe EEG is relatively subtle, similar to a hear-aid device). The dataset acquired in the hospital during presurgical evaluation. During such presurgical evaluation, neurologists try to see if a specific part of the brain is causing the seizures, and if so, if that part of the brain can be removed during surgery. During the presurgical evaluation, patients are monitored using the vEEG for multiple days (typically a week). Patients are however restricted to move within their room because of the wiring and video analysis. In this dataset, following data is available per p
The database consists of EEG recordings of 14 patients acquired at the Unit of Neurology and Neurophysiology of the University of Siena. Subjects include 9 males (ages 25-71) and 5 females (ages 20-58). Subjects were monitored with a Video-EEG with a sampling rate of 512 Hz, with electrodes arranged on the basis of the international 10-20 System. Most of the recordings also contain 1 or 2 EKG signals. The diagnosis of epilepsy and the classification of seizures according to the criteria of the International League Against Epilepsy were performed by an expert clinician after a careful review of the clinical and electrophysiological data of each patient.
SinGAN-Seg-polyps is a synthetic dataset for polyp segmentation consisting of 10,000 synthetic polyps and masks.
SkinCon is a skin disease dataset densely annotated by dermatologists. SkinCon includes 3230 images from the Fitzpatrick 17k skin disease (Fitzpatrick Skin Tone) dataset densely labelled with 48 clinical concepts, 22 of which have at least 50 images representing the concept. The concepts used were chosen by two dermatologists considering the clinical descriptor terms used to describe skin lesions. Examples include "plaque", "scale", and "erosion".
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
A public open dataset of synthetic chest X-ray images of COVID-19.
VISEM-Tracking is a dataset consisting of 20 video recordings of 30s of spermatozoa with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. It is an extension of the previously published VISEM dataset. In addition to the annotated data, unlabeled video clips are provided for easy-to-use access and analysis of the data.
The Medical Translation Task of WMT 2014 addresses the problem of domain-specific and genre-specific machine translation. The task is split into two subtasks: summary translation, focused on translation of sentences from summaries of medical articles, and query translation, focused on translation of queries entered by users into medical information search engines. Both subtasks included translation between English and Czech, German, and French, in both directions.
The Biomedical Translation Shared Task was first introduced at the First Conference of Machine Translation. The task aims to evaluate systems for the translation of biomedical titles and abstracts from scientific publications. The data includes three language pairs (English ↔ Portuguese, English ↔ Spanish, English ↔ French) and two sub-domains of biological sciences and health sciences.
An instance segmentation dataset of yeast cells in microstructures. The dataset includes 493 densely annotated microscopy images. For more information see the paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures".
deepMTJ: Muscle-Tendon Junction Tracking in Ultrasound Images deepMTJ is a machine learning approach for automatically tracking of muscle-tendon junctions (MTJ) in ultrasound images. Our method is based on a convolutional neural network trained to infer MTJ positions across various ultrasound systems from different vendors, collected in independent laboratories from diverse observers, on distinct muscles and movements. We built deepMTJ to support clinical biomechanists and locomotion researchers with an open-source tool for gait analyses.
It is a competition on kaggle with stroke Prediction, which is heavily imbalanced.
The scans are performed using a custom-built, highly flexible X-ray CT scanner, the FleX-ray scanner, developed by XRE nvand located in the FleX-ray Lab at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, Netherlands. The general purpose of the FleX-ray Lab is to conduct proof of concept experiments directly accessible to researchers in the field of mathematics and computer science. The scanner consists of a cone-beam microfocus X-ray point source that projects polychromatic X-rays onto a 1536-by-1944 pixels, 14-bit flat panel detector (Dexella 1512NDT) and a rotation stage in-between, upon which a sample is mounted. All three components are mounted on translation stages which allow them to move independently from one another.
0 PAPER • NO BENCHMARKS YET
This is a machine-learning-ready glaucoma dataset using a balanced subset of standardized fundus images from the Rotterdam EyePACS AIROGS train set. This dataset is split into training, validation, and test folders which contain 2500, 270, and 500 fundus images in each class respectively. Each training set has a folder for each class: referable glaucoma (RG) and non-referable glaucoma (NRG).
This is an improved machine-learning-ready glaucoma dataset using a balanced subset of standardized fundus images from the Rotterdam EyePACS AIROGS [1] set. This dataset is split into training, validation, and test folders which contain 4000 (~84%), 385 (~8%), and 385 (~8%) fundus images in each class respectively. Each training set has a folder for each class: referable glaucoma (RG) and non-referable glaucoma (NRG).
FAscicle Lower Leg Muscle Ultrasound Dataset is a dataset composed of 812 ultrasound images of lower leg muscles to analyze muscle weaknesses and prevent injuries. It combines the datasets provided by two articles, “Estimating Full Regional Skeletal Muscle Fibre Orientation from B-Mode Ultrasound Images Using Convolutional, Residual, and Deconvolutional Neural Networks” published by Ryan Cunningham et al. and “Automated Analysis of Musculoskeletal Ultrasound Images Using Deep Learning” published by Neil Cronin, with complementary annotations. The dataset has been introduced in this paper: Michard, H., Luvison, B., Pham, Q. C., Morales-Artacho, A. J., & Guilhem, G. (2021, August). AW-Net: automatic muscle structure analysis on B-mode ultrasound images for injury prevention. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 1-9).
FHRMA is an open-source project for Fetal Heart Rate Morphological Analysis containing Matlab source code and datasets. As a sub-project, it includes a deep learning method and dataset for automatic identification of the maternal heart rate (MHR) and, more generally, false signals (FSs) on fetal heart rate (FHR) recordings. The challenge concerns particularly the FHR signal recorded with Doppler sensors, on which MHR interference and other FSs are particularly common, but the dataset also includes FHR recorded with scalp-ECG. The training and validation dataset contained 1030 expert-annotated periods (mean duration: 36 min) from 635 recordings. Labels consist of annotating each time sample as either 1: False signal; 0: True signal, or -1: do not know or irrelevant.
The medaka (Oryzias latipes) and the zebrafish (Danio rerio) are used as a model organism for a variety of subjects in biomedical research. The presented work aims to study the potential of automated ventricular dimension estimation through heart segmentation in medaka. For more on this, it's time for a closer look on our paper and the supplementary materials.
A large paroxysmal atrial fibrillation long-term electrocardiogram monitoring database Abstract Atrial fibrillation (AF) is the most common sustained heart arrhythmia in adults. Holter monitoring, a long-term 2-lead electrocardiogram (ECG), is a key tool available to cardiologists for AF diagnosis. Machine learning (ML) and deep learning (DL) models have shown great capacity to automatically detect AF in ECG and their use as medical decision support tool is growing. Training these models rely on a few open and annotated databases. We present a new Holter monitoring database from patients with paroxysmal AF with 167 records from 152 patients, acquired from an outpatient cardiology clinic from 2006 to 2017 in Belgium. AF episodes were manually annotated and reviewed by an expert cardiologist and a specialist cardiac nurse. Records last from 19 hours up to 95 hours, divided into 24-hour files. In total, it represents 24 million seconds of annotated Holter monitoring, sampled at 200 Hz. Th
Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is the largest public repository of fundus images with glaucoma.
Toronto NeuroFace Dataset: A New Dataset for Facial Motion Analysis in Individuals with Neurological Disorders
28,943 Humphrey Visual Field (HVF) tests from 3,871 patients and 7,428 eyes.