HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data
The recent success of machine learning methods applied to time series collected from Intensive Care Units (ICU) exposes the lack of standardized machine learning benchmarks for developing and comparing such methods. While raw datasets, such as MIMIC-IV or eICU, can be freely accessed on Physionet, the choice of tasks and pre-processing is often chosen ad-hoc for each publication, limiting comparability across publications. In this work, we aim to improve this situation by providing a benchmark covering a large spectrum of ICU-related tasks. Using the HiRID dataset, we define multiple clinically relevant tasks in collaboration with clinicians. In addition, we provide a reproducible end-to-end pipeline to construct both data and labels. Finally, we provide an in-depth analysis of current state-of-the-art sequence modeling methods, highlighting some limitations of deep learning approaches for this type of data. With this benchmark, we hope to give the research community the possibility of a fair comparison of their work.
PDF Abstract NeurIPS Datasets 2021 PDF NeurIPS Datasets 2021 AbstractDatasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Remaining Length of Stay | HiRID | LGBM | MAE | 56.9±0.4 | # 1 | |
Respiratory Failure | HiRID | LSTM | AUPRC | 0.569±0.003 | # 7 | |
Kidney Function | HiRID | LGBM | MAE | 0.45±0.00 | # 1 | |
Kidney Function | HiRID | TCN | MAE | 0.50±0.01 | # 5 | |
Kidney Function | HiRID | LSTM | MAE | 0.50±0.01 | # 5 | |
Kidney Function | HiRID | GRU | MAE | 0.49±0.02 | # 4 | |
Kidney Function | HiRID | Transformer | MAE | 0.48±0.02 | # 3 | |
Kidney Function | HiRID | LGBM ( + hand crafted features) | MAE | 0.45±0.00 | # 1 | |
Circulatory Failure | HiRID | LR | AUPRC | 0.305±0.000 | # 6 | |
Circulatory Failure | HiRID | LSTM | AUPRC | 0.32.2±0.008 | # 7 | |
Circulatory Failure | HiRID | TCN | AUPRC | 0.35.8±0.006 | # 7 | |
Circulatory Failure | HiRID | GRU | AUPRC | 0.368±0.005 | # 4 | |
Circulatory Failure | HiRID | Transformer | AUPRC | 0.352±0.006 | # 5 | |
Circulatory Failure | HiRID | LGBM | AUPRC | 0.389±0.003 | # 2 | |
Circulatory Failure | HiRID | LGBM ( + hand crafted features) | AUPRC | 0.388±0.002 | # 3 | |
Respiratory Failure | HiRID | Logistic Regression | AUPRC | 0.530±0.000 | # 8 | |
Respiratory Failure | HiRID | GRU | AUPRC | 0.592±0.003 | # 4 | |
Respiratory Failure | HiRID | TCN | AUPRC | 0.589±0.003 | # 5 | |
Respiratory Failure | HiRID | Transformer | AUPRC | 0.594±0.003 | # 3 | |
Respiratory Failure | HiRID | LGBM | AUPRC | 0.585±0.001 | # 6 | |
Respiratory Failure | HiRID | LGBM ( + hand crafted features) | AUPRC | 0.604±0.002 | # 1 | |
Patient Phenotyping | HiRID | Logistic Regression | Balanced Accuracy | 39.1±0.0 | # 7 | |
Patient Phenotyping | HiRID | GRU | Balanced Accuracy | 39.2±2.1 | # 6 | |
Patient Phenotyping | HiRID | LSTM | Balanced Accuracy | 39.5±1.2 | # 5 | |
Patient Phenotyping | HiRID | LGBM | Balanced Accuracy | 40.4±0.8 | # 4 | |
Patient Phenotyping | HiRID | TCN | Balanced Accuracy | 41.6±2.3 | # 3 | |
Patient Phenotyping | HiRID | Transformer | Balanced Accuracy | 42.7±1.4 | # 2 | |
Patient Phenotyping | HiRID | LGBM ( + hand crafted features) | Balanced Accuracy | 45.8±2.0 | # 1 | |
ICU Mortality | HiRID | LGBM | AUPRC | 0.546±0.008 | # 7 | |
ICU Mortality | HiRID | Logistic Regression | AUPRC | 0.581±0.000 | # 6 | |
ICU Mortality | HiRID | LSTM | AUPRC | 0.600±0.009 | # 5 | |
ICU Mortality | HiRID | TCN | AUPRC | 0.602±0.011 | # 4 | |
ICU Mortality | HiRID | GRU | AUPRC | 0.603 ±0.016 | # 3 | |
ICU Mortality | HiRID | Transformer | AUPRC | 0.610±0.008 | # 2 | |
ICU Mortality | HiRID | LGBM ( + hand crafted features) | AUPRC | 0.626±0.000 | # 1 | |
Remaining Length of Stay | HiRID | LSTM | MAE | 60.7±1.6 | # 6 | |
Remaining Length of Stay | HiRID | GRU | MAE | 60.6±0.9 | # 5 | |
Remaining Length of Stay | HiRID | TCN | MAE | 59.8±2.8 | # 4 | |
Remaining Length of Stay | HiRID | Transformer | MAE | 59.5±2.8 | # 3 | |
Remaining Length of Stay | HiRID | LGBM ( + hand crafted features) | MAE | 57.0±0.3 | # 2 |