The Social determinants of health (SDoH) play a pivotal role in determining patient outcomes. However, their documentation in electronic health records (EHR) remains incomplete. This dataset was created from a study examining the capability of large language models in extracting SDoH from the free text sections of EHRs. Furthermore, the study delved into the potential of synthetic clinical text to bolster the extraction process of these scarcely documented, yet crucial, clinical data.
To understand potential biases in high-performing models and in those pre-trained on general text, GPT-4 was utilized to infuse demographic descriptors into our synthetic data.
For instance: - Original Sentence: "Widower admits fears surrounding potential judgment…" - Modified Sentence: "Hispanic widower admits fears surrounding potential judgment..."
Such demographic-infused sentences underwent manual validation. Out of these: - 419 had mentions of SDoH - 253 had mentions of adverse SDoH - The remainder were tagged as NO_SDoH
For a detailed understanding of the "adverse" labeling, refer to https://arxiv.org/pdf/2308.06354.pdf. Here, the 'adverse' column demarcates if the label corresponds to an "adverse" or "non-adverse" SDoH.
Adverse SDoH: 84% Macro-F1
Robustness Rate:
How to Cite:
@misc{guevara2023large,
title={Large Language Models to Identify Social Determinants of Health in Electronic Health Records},
author={Marco Guevara and Shan Chen and Spencer Thomas and Tafadzwa L. Chaunzwa and Idalid Franco and Benjamin Kann and Shalini Moningi and Jack Qian and Madeleine Goldstein and Susan Harper and Hugo JWL Aerts and Guergana K. Savova and Raymond H. Mak and Danielle S. Bitterman},
year={2023},
eprint={2308.06354},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Paper | Code | Results | Date | Stars |
---|