Search Results for author: Matthew B. A. McDermott

Found 19 papers, 12 papers with code

meds_reader: A fast and efficient EHR processing library

1 code implementation12 Sep 2024 Ethan Steinberg, Michael Wornow, Suhana Bedi, Jason Alan Fries, Matthew B. A. McDermott, Nigam H. Shah

The growing demand for machine learning in healthcare requires processing increasingly large electronic health record (EHR) datasets, but existing pipelines are not computationally efficient or scalable.

ACES: Automatic Cohort Extraction System for Event-Stream Datasets

1 code implementation28 Jun 2024 Justin Xu, Jack Gallifant, Alistair E. W. Johnson, Matthew B. A. McDermott

This library is designed to simultaneously simplify the development of task/cohorts for ML in healthcare and also enable the reproduction of these cohorts, both at an exact level for single datasets and at a conceptual level across datasets.

A Closer Look at AUROC and AUPRC under Class Imbalance

2 code implementations11 Jan 2024 Matthew B. A. McDermott, Lasse Hyldig Hansen, Haoran Zhang, Giovanni Angelotti, Jack Gallifant

In machine learning (ML), a widespread adage is that the area under the precision-recall curve (AUPRC) is a superior metric for model comparison to the area under the receiver operating characteristic (AUROC) for binary classification tasks with class imbalance.

Binary Classification

Structure Inducing Pre-Training

1 code implementation18 Mar 2021 Matthew B. A. McDermott, Brendan Yap, Peter Szolovits, Marinka Zitnik

Based on this review, we introduce a descriptive framework for pre-training that allows for a granular, comprehensive understanding of how relational structure can be induced.

Descriptive Inductive Bias +3

Adversarial Contrastive Pre-training for Protein Sequences

no code implementations31 Jan 2021 Matthew B. A. McDermott, Brendan Yap, Harry Hsu, Di Jin, Peter Szolovits

Recent developments in Natural Language Processing (NLP) demonstrate that large-scale, self-supervised pre-training can be extremely beneficial for downstream tasks.

Language Modelling

ML4H Abstract Track 2020

no code implementations19 Nov 2020 Emily Alsentzer, Matthew B. A. McDermott, Fabian Falck, Suproteem K. Sarkar, Subhrajit Roy, Stephanie L. Hyland

A collection of the accepted abstracts for the Machine Learning for Health (ML4H) workshop at NeurIPS 2020.

BIG-bench Machine Learning

CheXpert++: Approximating the CheXpert labeler for Speed,Differentiability, and Probabilistic Output

1 code implementation26 Jun 2020 Matthew B. A. McDermott, Tzu Ming Harry Hsu, Wei-Hung Weng, Marzyeh Ghassemi, Peter Szolovits

CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can't be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning.

Active Learning

Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation

no code implementations4 Dec 2019 Aparna Balagopalan, Jekaterina Novikova, Matthew B. A. McDermott, Bret Nestor, Tristan Naumann, Marzyeh Ghassemi

We learn mappings from other languages to English and detect aphasia from linguistic characteristics of speech, and show that OT domain adaptation improves aphasia detection over unilingual baselines for French (6% increased F1) and Mandarin (5% increased F1).

Domain Adaptation

Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks

1 code implementation2 Aug 2019 Bret Nestor, Matthew B. A. McDermott, Willie Boag, Gabriela Berner, Tristan Naumann, Michael C. Hughes, Anna Goldenberg, Marzyeh Ghassemi

When training clinical prediction models from electronic health records (EHRs), a key concern should be a model's ability to sustain performance over time when deployed, even as care practices, database systems, and population demographics evolve.

De-identification Length-of-Stay prediction +1

MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III

2 code implementations19 Jul 2019 Shirly Wang, Matthew B. A. McDermott, Geeticka Chauhan, Michael C. Hughes, Tristan Naumann, Marzyeh Ghassemi

Robust machine learning relies on access to data that can be used with standardized frameworks in important tasks and the ability to develop models whose performance can be reasonably reproduced.

BIG-bench Machine Learning Length-of-Stay prediction +3

Reproducibility in Machine Learning for Health

no code implementations2 Jul 2019 Matthew B. A. McDermott, Shirly Wang, Nikki Marinsek, Rajesh Ranganath, Marzyeh Ghassemi, Luca Foschini

Machine learning algorithms designed to characterize, monitor, and intervene on human health (ML4H) are expected to perform safely and reliably when operating at scale, potentially outside strict human supervision.

BIG-bench Machine Learning

REflex: Flexible Framework for Relation Extraction in Multiple Domains

1 code implementation WS 2019 Geeticka Chauhan, Matthew B. A. McDermott, Peter Szolovits

Systematic comparison of methods for relation extraction (RE) is difficult because many experiments in the field are not described precisely enough to be completely reproducible and many papers fail to report ablation studies that would highlight the relative contributions of their various combined techniques.

Relation Relation Extraction

Publicly Available Clinical BERT Embeddings

3 code implementations WS 2019 Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months.

De-identification

Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation

no code implementations30 Nov 2018 Bret Nestor, Matthew B. A. McDermott, Geeticka Chauhan, Tristan Naumann, Michael C. Hughes, Anna Goldenberg, Marzyeh Ghassemi

Machine learning for healthcare often trains models on de-identified datasets with randomly-shifted calendar dates, ignoring the fact that data were generated under hospital operation practices that change over time.

BIG-bench Machine Learning Mortality Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.