Texts

SHADR (sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR))

Introduced by Guevara et al. in Large Language Models to Identify Social Determinants of Health in Electronic Health Records

SDoH Human Annotated Demoographic Robustness (SHADR) Dataset

Overview

The Social determinants of health (SDoH) play a pivotal role in determining patient outcomes. However, their documentation in electronic health records (EHR) remains incomplete. This dataset was created from a study examining the capability of large language models in extracting SDoH from the free text sections of EHRs. Furthermore, the study delved into the potential of synthetic clinical text to bolster the extraction process of these scarcely documented, yet crucial, clinical data.

Dataset Structure & Modification

To understand potential biases in high-performing models and in those pre-trained on general text, GPT-4 was utilized to infuse demographic descriptors into our synthetic data.

For instance: - Original Sentence: "Widower admits fears surrounding potential judgment…" - Modified Sentence: "Hispanic widower admits fears surrounding potential judgment..."

Such demographic-infused sentences underwent manual validation. Out of these: - 419 had mentions of SDoH - 253 had mentions of adverse SDoH - The remainder were tagged as NO_SDoH

Instructions for Model Evaluation

Initially, run your model inference on the original sentences.
Subsequently, apply the same model to infer on the demographic-modified sentences.
Perform comparisons for robustness.

For a detailed understanding of the "adverse" labeling, refer to https://arxiv.org/pdf/2308.06354.pdf. Here, the 'adverse' column demarcates if the label corresponds to an "adverse" or "non-adverse" SDoH.

Current Performance Metrics

Best Model Performance:
Any SDoH: 88% Macro-F1
Adverse SDoH: 84% Macro-F1
Robustness Rate:
Any SDoH: 9.9%
Adverse SDoH: 14.3%

<hr />

How to Cite:

@misc{guevara2023large,
      title={Large Language Models to Identify Social Determinants of Health in Electronic Health Records}, 
      author={Marco Guevara and Shan Chen and Spencer Thomas and Tafadzwa L. Chaunzwa and Idalid Franco and Benjamin Kann and Shalini Moningi and Jack Qian and Madeleine Goldstein and Susan Harper and Hugo JWL Aerts and Guergana K. Savova and Raymond H. Mak and Danielle S. Bitterman},
      year={2023},
      eprint={2308.06354},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

SHADR (sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR))

SDoH Human Annotated Demoographic Robustness (SHADR) Dataset

Overview

Dataset Structure & Modification

Instructions for Model Evaluation

Current Performance Metrics

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

SHADR (sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR))

SDoH Human Annotated Demoographic Robustness (SHADR) Dataset

Overview

Dataset Structure & Modification

Instructions for Model Evaluation

Current Performance Metrics

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages