TMED is a clinically-motivated benchmark dataset for computer vision and machine learning from limited labeled data.

Two overall goals inspired this dataset:

1) We wish to improve timely diagnosis and treatment of aortic stenosis (AS), a common degenerative cardiac valve condition. AS is a particularly important condition where automation holds substantial promise. Automated screening for AS can increase referral and treatment rates for patients with this life threatening condition.

2) We wish to provide an authentic assessment of semi-supervised learning (SSL) methods to the computer vision and ML research community. Especially in medical contexts, labels are often difficult and expensive to acquire. SSL is promising way to combine a small labeled set (images plus expert annotations) with a large, easy-to-acquire unlabeled set (images only). However, most existing benchmark datasets don't represent the challenges of truly uncurated unlabeled sets in a medical context. We hope our data release catalyzes work on methods for effective multi-task SSL.

The dataset is available for academic use to any researcher who applies for access on our website and agrees to a standard data use agreement (do not share the data with non-approved users, no commercial use, no attempt to reidentify patients, etc.).

Dataset contents

The TMED dataset contains imagery from 2773 patients and supervised labels for two classification tasks from a small subset of 260 patients (because labels are difficult to acquire). All data is de-identified and approved for release by our IRB. Imagery comes from transthoracic echocardiograms acquired in the course of routine care consistent with American Society of Echocardiography (ASE) guidelines, all obtained from 2015-2020 at Tufts Medical Center.

When gathering echocardiogram imagery for each patient, a sonographer manipulates a handheld transducer over the patient’s chest, manually choosing different acquisition angles in order to fully assess the heart’s complex anatomy. This imaging process results in multiple cineloop video clips of the heart depicting various anatomical views. We extract one still image from each available video clip, so each patient study is represented in our dataset as multiple images (typically ~100). Each image is preprocessed to a grayscale 64x64 PNG.

Two kinds of labels are available for the labeled subset of patients:

  • View labels (PLAX/PSAX/Other), indicating which standard anatomical view is shown by the image. Each image in our fully-labeled set is annotated.
  • Diagnostic labels (no AS, mild/moderate AS, severe AS), indicating the severity of disease. Each patient in our fully-labeled set is annotated.

For more information, see our website and our published paper at MLHC '21

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages