CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AI

Clinical Artificial lntelligence (AI) applications are rapidly expanding worldwide, and have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We use the CheXpert and PadChest public datasets to build and test a medical imaging AI drift monitoring workflow to track data and model drift without contemporaneous ground truth. We simulate drift in multiple experiments to compare model performance with our novel multi-modal drift metric, which uses DICOM metadata, image appearance representation from a variational autoencoder (VAE), and model output probabilities as input. Through experimentation, we demonstrate a strong proxy for ground truth performance using unsupervised distributional shifts in relevant metadata, predicted probabilities, and VAE latent representation. Our key contributions include (1) proof-of-concept for medical imaging drift detection that includes the use of VAE and domain specific statistical methods, (2) a multi-modal methodology to measure and unify drift metrics, (3) new insights into the challenges and solutions to observe deployed medical imaging AI, and (4) creation of open-source tools that enable others to easily run their own workflows and scenarios. This work has important implications. It addresses the concerning translation gap found in continuous medical imaging AI model monitoring common in dynamic healthcare environments.

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.