In their initial interaction with patients, doctors do not only focus on identifying the pathology a patient is suffering from; they instead generate a differential diagnosis (in the form of a short list of plausible diseases) because the medical evidence collected from patients is often insufficient to establish a final diagnosis.
In this work, we present a large-scale synthetic dataset of roughly 1. 3 million patients that includes a differential diagnosis, along with the ground truth pathology, symptoms and antecedents for each patient.
One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets.
Ranked #1 on Mortality Prediction on MIMIC-III (Accuracy metric)