🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language

336 dataset results for Medical

Br35H :: Brain Tumor Detection 2020

✔️Abstract A Brain tumor is considered as one of the aggressive diseases, among children and adults. Brain tumors account for 85 to 90 percent of all primary Central Nervous System (CNS) tumors. Every year, around 11,700 people are diagnosed with a brain tumor. The 5-year survival rate for people with a cancerous brain or CNS tumor is approximately 34 percent for men and36 percent for women. Brain Tumors are classified as: Benign Tumor, Malignant Tumor, Pituitary Tumor, etc. Proper treatment, planning, and accurate diagnostics should be implemented to improve the life expectancy of the patients. The best technique to detect brain tumors is Magnetic Resonance Imaging (MRI). A huge amount of image data is generated through the scans. These images are examined by the radiologist. A manual examination can be error-prone due to the level of complexities involved in brain tumors and their properties. Application of automated classification techniques using Machine Learning (ML) and Artificia

2 PAPERS • NO BENCHMARKS YET

Breast Lesion Detection in Ultrasound Videos (CVA-Net)

The breast lesion detection in ultrasound videos dataset uses a clip-level and video-level feature aggregated network (CVA-Net) and consists of 188 ultrasound videos, of which 113 are labeled malignant and 75 benign. Overall these consist of 25,272 ultrasound images in total with the number of images for each video varying from 28 to 413. 150 videos were used for training, 38 for testing. The primary intended use case would be for computer-aided breast cancer diagnosis, supporting systems to assist radiologists.

2 PAPERS • NO BENCHMARKS YET

CENTER-TBI (Collaborative European NeuroTrauma Effectiveness Research in TBI)

The CENTER-TBI database contains prospectively collected data of more than 4,500 patients with TBI in Europe. The Registry and Acute Care data has been collected during a 3 years’ period (2015-2017) in 65 centers in Europe. For all patients, outcome data has been collected up to 2 years after injury.

2 PAPERS • NO BENCHMARKS YET

Chest x-ray landmark dataset

Set of landmark annotations for JSRT, Montgomery, Shenzhen and a subset of Padchest datasets

2 PAPERS • NO BENCHMARKS YET

Colorectal Adenoma

Colorectal Adenoma contains 177 whole slide images (156 contain adenoma) gathered and labelled by pathologists from the Department of Pathology, The Chinese PLA General Hospital.

2 PAPERS • NO BENCHMARKS YET

DisKnE (Disease Knowledge Evaluation)

DisKnE is a benchmark for Disease Knowledge Evaluation built from MedNLI and MEDIQA-NLI. This benchmark is constructed to specifically test the medical reasoning capabilities of ML models, such as mapping symptoms to diseases.

2 PAPERS • NO BENCHMARKS YET

EPISURG

EPISURG (EPISURG: a dataset of postoperative MRI for quantitative analysis of resection neurosurgery for refractory epilepsy)

EPISURG is a clinical dataset of $T_1$-weighted magnetic resonance images (MRI) from 430 epileptic patients who underwent resective brain surgery at the National Hospital of Neurology and Neurosurgery (Queen Square, London, United Kingdom) between 1990 and 2018.

2 PAPERS • NO BENCHMARKS YET

Endotect Polyp Segmentation Challenge Dataset

A challenge that consists of three tasks, each targeting a different requirement for in-clinic use. The first task involves classifying images from the GI tract into 23 distinct classes. The second task focuses on efficiant classification measured by the amount of time spent processing each image. The last task relates to automatcially segmenting polyps.

2 PAPERS • 1 BENCHMARK

FetReg

Fetoscopic Placental Vessel Segmentation and Registration (FetReg) is a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.

2 PAPERS • NO BENCHMARKS YET

GUE

GUE (Genome Understanding Evaluation)

A collection of $28$ datasets across $7$ tasks constructed for genome language model evaluation. Contains seven tasks: promoter prediction. core promoter prediction, splice site prediction, covid variant classification, epigenetic marks prediction, and transcription factor binding sites prediction on human and mouse.

2 PAPERS • 7 BENCHMARKS

HiRID

HiRID is a freely accessible critical care dataset containing data relating to almost 34 thousand patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed unit admitting >6,500 patients per year. The ICU offers the full range of modern interdisciplinary intensive care medicine for adult patients. The dataset was developed in cooperation between the Swiss Federal Institute of Technology (ETH) Zürich, Switzerland and the ICU.

2 PAPERS • 6 BENCHMARKS

IBC

IBC (Individual Brain Charting)

The Individual Brain Charting (IBC) project aims at providing a new generation of functional-brain atlases. To map cognitive mechanisms in a fine scale, task-fMRI data at high-spatial-resolution are being acquired on a fixed cohort of 12 participants, while performing many different tasks. These data—free from both inter-subject and inter-site variability—are publicly available as means to support the investigation of functional segregation and connectivity as well as individual variability with a view to establishing a better link between brain systems and behavior.

2 PAPERS • NO BENCHMARKS YET

ISIC 2017 Task 2

The ISIC 2017 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 2 challenge dataset for lesion dermoscopic feature extraction contains the original lesion image, a corresponding superpixel mask, and superpixel-mapped expert annotations of the presence and absence of the following features: (a) network, (b) negative network, (c) streaks and (d) milia-like cysts.

2 PAPERS • NO BENCHMARKS YET

Kvasir-Capsule

Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.

2 PAPERS • NO BENCHMARKS YET

MCSCSet

MCSCSet is a large-scale specialist-annotated dataset, designed for the task of Medical-domain Chinese Spelling Correction that contains about 200k samples. MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists.

2 PAPERS • NO BENCHMARKS YET

MIMIC-CXR-LT (long-tailed version of MIMIC-CXR)

MIMIC-CXR-LT. We construct a single-label, long-tailed version of MIMIC-CXR in a similar manner. MIMIC-CXR is a multi-label classification dataset with over 200,000 chest X-rays labeled with 13 pathologies and a “No Findings” class. The resulting MIMIC-CXR-LT dataset contains 19 classes, of which 10 are head classes, 6 are medium classes, and 3 are tail classes. MIMIC-CXR-LT contains 111,792 images labeled with one of 18 diseases, with 87,493 training images and 23,550 test set images. The validation and balanced test sets contain 15 and 30 images per class, respectively.

2 PAPERS • 1 BENCHMARK

MIMIC-SPARQL

Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine. In this light, QA on Electronic Health Records (EHR), namely EHR QA, can work as a crucial milestone toward developing an intelligent agent in healthcare. EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA.

2 PAPERS • NO BENCHMARKS YET

NIH-CXR-LT (Long-tailed (LT) NIH ChestXRay14)

NIH-CXR-LT. NIH ChestXRay14 contains over 100,000 chest X-rays labeled with 14 pathologies, plus a “No Findings” class. We construct a single-label, long-tailed version of the NIH ChestXRay14 dataset by introducing five new disease findings described above. The resulting NIH-CXR-LT dataset has 20 classes, including 7 head classes, 10 medium classes, and 3 tail classes. NIH-CXR-LT contains 88,637 images labeled with one of 19 thorax diseases, with 68,058 training and 20,279 test images. The validation and balanced test sets contain 15 and 30 images per class, respectively.

2 PAPERS • 1 BENCHMARK

OADAT

OADAT (OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing)

An experimental and synthetic (simulated) OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries.

2 PAPERS • NO BENCHMARKS YET

PAX-Ray++ (Projected Anatomy in X-Ray Dataset ++)

The PAX-Ray++ dataset uses pseudo-labeled thorax CTs to enable the segmentation of anatomy in Chest X-Rays. By projecting the CTs to a 2D plane, we gather fine-grained annotated imaages resembling radiographs. It contains 7,377 frontal and lateral view images each with 157 anatomy classes and over 2 million annotated instances.

2 PAPERS • NO BENCHMARKS YET

PhysioNet Challenge 2016

Introduction The 2016 PhysioNet/CinC Challenge aims to encourage the development of algorithms to classify heart sound recordings collected from a variety of clinical or nonclinical (such as in-home visits) environments. The aim is to identify, from a single short recording (10-60s) from a single precordial location, whether the subject of the recording should be referred on for an expert diagnosis.

2 PAPERS • NO BENCHMARKS YET

Placenta

Placenta is a benchmark dataset for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in placenta histology whole slide images. Cell graphs are large (>1 million nodes per image), node features are varied (64-dimensions of 11 types of cells), class labels are imbalanced (9 classes ranging from 0.21% of the data to 40.0%), and cellular communities cluster into heterogeneously distributed tissues of widely varying sizes (from 11 nodes to 44,671 nodes for a single structure).

2 PAPERS • 1 BENCHMARK

PulseImpute

PulseImpute is a benchmark for Pulsative Physiological Signal Imputation which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. It contains 440,953 100 Hz 5-minute ECG waveforms from 32,930 patients

2 PAPERS • NO BENCHMARKS YET

RETOUCH (RETOUCH -The Retinal OCT Fluid Detection and Segmentation Benchmark and Challenge)

The goal of the challenge is to compare automated algorithms that are able to detect and segment various types of fluids on a common dataset of optical coherence tomography (OCT) volumes representing different retinal diseases, acquired with devices from different manufacturers. We made available a dataset of OCT volumes containing a wide variety of retinal fluid lesions with accompanying reference annotations. We invite the medical imaging community to participate by developing and testing existing and novel automated retinal OCT segmentation methods.

2 PAPERS • NO BENCHMARKS YET

RSDD-Time

RSDD-Time is a dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Additionally, the dataset includes exact temporal spans that relate to the date of diagnosis.

2 PAPERS • NO BENCHMARKS YET

SemClinBr

SemClinBr (A multi‑institutional and multi‑specialty semantically annotated corpus for Portuguese clinical NLP tasks)

Background: The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. Methods: In this study, a semantically annotated corpus was developed using clinical text from multiple medical specialties, document types, and institutions. In addition, we present, (1) a survey listing common aspects, differences, and lessons learned from previous research, (2) a fine-grained annotation schema that can be replicated to guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic ev

2 PAPERS • 1 BENCHMARK

THYME-2016

2 PAPERS • 1 BENCHMARK

Tc1 Mouse cerebellum atlas (Tc1 Mouse cerebellum atlas with Purkinje layer segmentation)

This mouse cerebellar atlas can be used for mouse cerebellar morphometry.

2 PAPERS • NO BENCHMARKS YET

Ward2ICU

Ward2ICU is a vital signs dataset of inpatients from the general ward. It contains vital signs with class labels indicating patient transitions from the ward to intensive care units

2 PAPERS • NO BENCHMARKS YET

ABCD Study

ABCD Study (Adolescent Brain Cognitive Development)

The ABCD Study is a prospective longitudinal study starting at the ages of 9-10 and following participants for 10 years. The study includes a diverse sample of nearly 12,000 youth enrolled at 21 research sites across the country. It measures brain development (via structural, task functional, and resting state functional imaging), social, emotional, and cognitive development, mental health, substance use and attitudes, gender identity and sexual health, bio-specimens, as well as a variety of physical health, and environmental factors.

1 PAPER • NO BENCHMARKS YET

ACCT Data Repository (ACCT is a fast and accessible automatic cell counting tool using machine learning for 2D image segmentation)

This dataset is a collection of fluorescent images from mice in order to test an automatic cell counting tool that we developed. 62 images viewed from 2 or 3 different fields of views are shown. In brief, the dataset was derived from brain sections of a model for HIV-induced brain injury (HIVgp120tg), which expresses soluble gp120 envelope protein in astrocytes under the control of a modified GFAP promoter. The mice were in a mixed C57BL/6.129/SJL genetic background, and two genotypes of 9 month old male mice were selected: wild type controls (Resting, n = 3) and transgenic littermates (HIVgp120tg, Activated, n = 3). No randomization was performed. HIVgp120tg mice show among other hallmarks of human HIV neuropathology an increase in microglia numbers which indicates activation of the cells compared to non-transgenic littermate controls.

1 PAPER • NO BENCHMARKS YET

AI-ready multiplex IHC-IF dataset

AI-ready multiplex IHC-IF dataset (AI-ready restained and co-registered multiplex dataset for head-and-neck squamous cell carcinoma)

We introduce a new AI-ready computational pathology dataset containing restained and co-registered digitized images from eight head-and-neck squamous cell carcinoma patients. Specifically, the same tumor sections were stained with the expensive multiplex immunofluorescence (mIF) assay first and then restained with cheaper multiplex immunohistochemistry (mIHC). This is a first public dataset that demonstrates the equivalence of these two staining methods which in turn allows several use cases; due to the equivalence, our cheaper mIHC staining protocol can offset the need for expensive mIF staining/scanning which requires highly skilled lab technicians. As opposed to subjective and error-prone immune cell annotations from individual pathologists (disagreement > 50%) to drive SOTA deep learning approaches, this dataset provides objective immune and tumor cell annotations via mIF/mIHC restaining for more reproducible and accurate characterization of tumor immune microenvironment (e.g. for

1 PAPER • NO BENCHMARKS YET

AIROGS (Rotterdam EyePACS AIROGS)

The Rotterdam EyePACS AIROGS dataset (in full, so including train and test) contains 113,893 color fundus images from 60,357 subjects and approximately 500 different sites with a heterogeneous ethnicity.

1 PAPER • NO BENCHMARKS YET

BIDS CHB-MIT Scalp EEG Database

This dataset is a BIDS-compatible version of the CHB-MIT Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:

1 PAPER • NO BENCHMARKS YET

BIDS Siena Scalp EEG Database

This dataset is a BIDS compatible version of the Siena Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:

1 PAPER • NO BENCHMARKS YET

BioVid (BioVid Heat Pain Database)

To advance methods for pain assessment, in particular automatic assessment methods, the BioVid Heat Pain Database was collected in a collaboration of the Neuro-Information Technology group of the University of Magdeburg and the Medical Psychology group of the University of Ulm. In our study, 90 participants were subjected to experimentally induced heat pain in four intensities. To compensate for varying heat pain sensitivities, the stimulation temperatures were adjusted based on the subject-specific pain threshold and pain tolerance. Each of the four pain levels was stimulated 20 times in randomized order. For each stimulus, the maximum temperature was held for 4 seconds. The pauses between the stimuli were randomized between 8-12 seconds. The pain stimulation experiment was conducted twice: once with un-occluded face and once with facial EMG sensors.

1 PAPER • NO BENCHMARKS YET

Blood Cell Detection Dataset

Overview This is a dataset of blood cells photos.

1 PAPER • NO BENCHMARKS YET

BreastClassifications4 ([MIMBCD-UI] UTA4: Severity & Pathology Classifications Dataset)

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present the real results severity (BIRADS) and pathology (post-report) classifications provided by the Radiologist Director from the Radiology Department of Hospital Fernando Fonseca while diagnosing several patients (see dataset-uta4-dicom) from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of both severity (BIRADS) and pathology classifications concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these t

1 PAPER • NO BENCHMARKS YET

BreastDICOM4 ([MIMBCD-UI] UTA4: Medical Imaging DICOM Files Dataset)

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our medical imaging DICOM files of patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset of the used medical images during the UTA4 tasks. This repository and respective dataset should be paired with the dataset-uta4-rates repository dataset. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted on our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset repres

1 PAPER • 1 BENCHMARK

BreastRates4 ([MIMBCD-UI] UTA4: Rates Dataset)

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our severity rates (BIRADS) of clinicians while diagnosing several patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of severity rates (BIRADS) concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset represents the pieces of information of bot

1 PAPER • NO BENCHMARKS YET

CLOUD (CLOUD Dataset)

The CLOUD dataset is a set of Optical Coherence Tomography of the Anterior Segment images (AS-OCT) used to the automatic identification and representation of the cornea-contact lens relationship. The dataset includes 112 AS-OCT images that were captured from 16 different patients. In particular, the images were obtained by an OCT Cirrus 500 scanner model of Carl Zeiss Meditec with an anterior segment module for users of scleral contact lens (SCL).

1 PAPER • NO BENCHMARKS YET

CMMD

CMMD (The Chinese Mammography Database)

Breast carcinoma is the second largest cancer in the world among women. Early detection of breast cancer has been shown to increase the survival rate, thereby significantly increasing patients' lifespans. Mammography, a noninvasive imaging tool with low cost, is widely used to diagnose breast disease at an early stage due to its high sensitivity. The recent popularization of artificial intelligence in computer-aided diagnosis creates opportunities for advances in areas such as (1) Computer-aided detection for locating suspect lesions such as mass and microcalcification, leaving the classification to the radiologist; and (2) Computer-aided diagnosis for characterizing the suspicious region of lesion and/or estimate its probability of onset; and (3) Findings of predictive image-based biomarkers by applying the computational methods to mine the potential relationships between image representation and molecular subtype, including luminal A, luminal B, HER2 positive, and Triple-negative.

1 PAPER • 1 BENCHMARK

CMeIE

CMeIE (Chinese Medical Information Extraction Dataset)

Chinese Medical Information Extraction, a dataset that is also released in CHIP2020, is used for CMeIE task. The task is aimed at identifying both entities and relations in a sentence following the schema constraints. There are 53 relations defined in the dataset, including 10 synonymous sub-relationships and 43 other sub-relationships.

1 PAPER • 1 BENCHMARK

COVIDx CXR-3

COVIDx CXR-3 is an open access benchmark dataset that we generated, comprising 30,882 CXR images across 17,026 patient cases. Images may be added over time to improve the dataset.

1 PAPER • 1 BENCHMARK

CPCXR

CPCXR (COVID-19 Posteroanterior Chest X-Ray fused)

The COVID-19 Posteroanterior Chest X-Ray fused (CPCXR) dataset is generated by the fusion of three publicly available datasets: COVID-19 cxr image, Radiological Society of North America (RSNA), and U.S. national library of medicine (USNLM) collected Montgomery country - NLM(MC). The dataset consists of samples of diseases labeled as COVID-19, Tuberculosis, Other pneumonia (SARS, MERS, etc.), and Normal. The dataset can be utilized to train an evaulate deep learning and machine learning models as binary and multi-class classification problem.

1 PAPER • NO BENCHMARKS YET

CPSC2019

CPSC2019 (The 2nd China Physiological Signal Challenge (CPSC 2019))

Introduction The China Physiological Signal Challenge 2019 (CPSC 2019) aims to encourage the development of algorithms for challenging QRS detection and heart rate (HR) estimation from short-term single-lead ECG recordings usually with low signal quality and/or abnormal rhythm waveforms.

1 PAPER • NO BENCHMARKS YET

CPSC2020

CPSC2020 (The 3rd China Physiological Signal Challenge 2020)

Introduction Abnormality of cardiac conduction system can induce arrhythmia. Abnormal heart rhythm can lead to other cardiac diseases and complications, and can be life-threatening 1. There are various types of arrhythmias and each type is associated with a pattern, and as such, it is possible to be identified. Arrhythmias can be classified into two major categories. The first category consists of arrhythmias formed by a single irregular heartbeat in electrocardiogram (ECG), herein called morphological arrhythmia, while another category consists of arrhythmias formed by a set of irregular heartbeats in ECG, herein called rhythmic arrhythmias 2. Dynamic electrocardiogram (DCG), like ECG Holter, provides an important way to monitor the incidences of arrhythmias in daily life, facilitating the doctors to check a total number and distribution of arrhythmias in a long time and thus to provide the required therapy to prevent further problems. The 3rd China Physiological Signal Challenge 2020

1 PAPER • NO BENCHMARKS YET

CPSC2021

CPSC2021 (The 4th China Physiological Signal Challenge 2021)

Introduction The 4th China Physiological Signal Challenge 2021 (CPSC 2021) aims to encourage the development of algorithms for searching the paroxysmal atrial fibrillation (PAF) events from dynamic ECG recordings.

1 PAPER • NO BENCHMARKS YET

Datasets

336 dataset results for Medical