Search Results for author: Tristan Naumann

Found 34 papers, 12 papers with code

Training Small Multimodal Models to Bridge Biomedical Competency Gap: A Case Study in Radiology Imaging

no code implementations • 12 Mar 2024 • Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, ZiYi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Serena Yeung-Levy, Curtis P. Langlotz, Sheng Wang, Hoifung Poon

Frontier models such as GPT-4V still have major competency gaps in multimodal capabilities for biomedical applications.

Cross-Modal Retrieval

Paper
Add Code

Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium

no code implementations • 3 Mar 2024 • Hyewon Jeong, Sarah Jabbour, Yuzhe Yang, Rahul Thapta, Hussein Mozannar, William Jongwon Han, Nikita Mehandru, Michael Wornow, Vladislav Lialin, Xin Liu, Alejandro Lozano, Jiacheng Zhu, Rafal Dariusz Kocielnik, Keith Harrigian, Haoran Zhang, Edward Lee, Milos Vukadinovic, Aparna Balagopalan, Vincent Jeanselme, Katherine Matton, Ilker Demirel, Jason Fries, Parisa Rashidi, Brett Beaulieu-Jones, Xuhai Orson Xu, Matthew McDermott, Tristan Naumann, Monica Agrawal, Marinka Zitnik, Berk Ustun, Edward Choi, Kristen Yeom, Gamze Gursoy, Marzyeh Ghassemi, Emma Pierson, George Chen, Sanjat Kanjilal, Michael Oberst, Linying Zhang, Harvineet Singh, Tom Hartvigsen, Helen Zhou, Chinasa T. Okolo

The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables.

Paper
Add Code

Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries

1 code implementation • 1 Mar 2024 • Zelalem Gero, Chandan Singh, Yiqing Xie, Sheng Zhang, Tristan Naumann, Jianfeng Gao, Hoifung Poon

Summarizing clinical text is crucial in health decision-support and clinical research.

Attribute Text Summarization

Paper
Code

DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

1 code implementation • 16 Nov 2023 • Yiqing Xie, Sheng Zhang, Hao Cheng, PengFei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, Carolyn Rose

Medical text generation aims to assist with administrative work and highlight salient information to support decision-making.

Decision Making Instruction Following +1

Paper
Code

TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models

no code implementations • 2 Nov 2023 • Javier González, Cliff Wong, Zelalem Gero, Jass Bagga, Risa Ueno, Isabel Chien, Eduard Oravkin, Emre Kiciman, Aditya Nori, Roshanthi Weerasinghe, Rom S. Leidner, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon

The rapid digitization of real-world data offers an unprecedented opportunity for optimizing healthcare delivery and accelerating biomedical discovery.

Causal Inference Denoising +1

Paper
Add Code

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

no code implementations • 4 Aug 2023 • Cliff Wong, Sheng Zhang, Yu Gu, Christine Moung, Jacob Abel, Naoto Usuyama, Roshanthi Weerasinghe, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon

Clinical trial matching is a key process in health delivery and discovery.

Paper
Add Code

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

no code implementations • 12 Jul 2023 • Yu Gu, Sheng Zhang, Naoto Usuyama, Yonas Woldesenbet, Cliff Wong, Praneeth Sanapathi, Mu Wei, Naveen Valluri, Erika Strandberg, Tristan Naumann, Hoifung Poon

We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised learning, substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access.

Self-Supervised Learning

Paper
Add Code

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

no code implementations • NeurIPS 2023 • Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao

In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.

Instruction Following Language Modelling +2

Paper
Add Code

Self-Verification Improves Few-Shot Clinical Information Extraction

1 code implementation • 30 May 2023 • Zelalem Gero, Chandan Singh, Hao Cheng, Tristan Naumann, Michel Galley, Jianfeng Gao, Hoifung Poon

Extracting patient information from unstructured text is a critical task in health decision-support and clinical research.

In-Context Learning

Paper
Code

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making

1 code implementation • 27 May 2023 • Aliyah R. Hsu, Yeshwanth Cherapanamjeri, Briton Park, Tristan Naumann, Anobel Y. Odisho, Bin Yu

These findings showcase the utility of SUFO in enhancing trust and safety when using transformers in medicine, and we believe SUFO can aid practitioners in evaluating fine-tuned language models for other applications in medicine and in more critical domains.

Decision Making

Paper
Code

What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization

1 code implementation • 12 May 2023 • Griffin Adams, Bichlien H Nguyen, Jake Smith, Yingce Xia, Shufang Xie, Anna Ostropolets, Budhaditya Deb, Yuan-Jyue Chen, Tristan Naumann, Noémie Elhadad

Summarization models often generate text that is poorly calibrated to quality metrics because they are trained to maximize the likelihood of a single reference (MLE).

Paper
Code

Compositional Zero-Shot Domain Transfer with Text-to-Text Models

no code implementations • 23 Mar 2023 • Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Sheng Zhang, Tristan Naumann, Aditya Nori, Hoifung Poon, Javier Alvarez-Valle, Ozan Oktay, Stephanie L. Hyland

We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning.

Data Augmentation Multi-Task Learning

Paper
Add Code

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

3 code implementations • 2 Mar 2023 • Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon

Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs.

Ranked #3 on Medical Visual Question Answering on SLAKE-English

Medical Visual Question Answering Pneumonia Detection +3

Paper
Code

Continual Contrastive Finetuning Improves Low-Resource Relation Extraction

no code implementations • 21 Dec 2022 • Wenxuan Zhou, Sheng Zhang, Tristan Naumann, Muhao Chen, Hoifung Poon

In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning.

Contrastive Learning Relation +3

Paper
Add Code

Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

1 code implementation • 21 Apr 2022 • Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay

We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing.

Contrastive Learning Language Modelling +4

233

Paper
Code

A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022

no code implementations • 28 Mar 2022 • Gerardo Flores, George H. Chen, Tom Pollard, Joyce C. Ho, Tristan Naumann

A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022.

Paper
Add Code

Towards Structuring Real-World Data at Scale: Deep Learning for Extracting Key Oncology Information from Clinical Text with Patient-Level Supervision

no code implementations • 20 Mar 2022 • Sam Preston, Mu Wei, Rajesh Rao, Robert Tinn, Naoto Usuyama, Michael Lucas, Roshanthi Weerasinghe, Soohee Lee, Brian Piening, Paul Tittel, Naveen Valluri, Tristan Naumann, Carlo Bifulco, Hoifung Poon

Results: We conduct an extensive study on 135, 107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states.

Sentence

Paper
Add Code

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

no code implementations • 15 Dec 2021 • Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

Overall, domainspecific vocabulary and pretraining facilitate more robust models for fine-tuning.

text similarity Transfer Learning

Paper
Add Code

Knowledge-Rich Self-Supervision for Biomedical Entity Linking

no code implementations • 15 Dec 2021 • Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

Zero-shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia.

Contrastive Learning Entity Linking

Paper
Add Code

Modular Self-Supervision for Document-Level Relation Extraction

no code implementations • EMNLP 2021 • Sheng Zhang, Cliff Wong, Naoto Usuyama, Sarthak Jain, Tristan Naumann, Hoifung Poon

Extracting relations across large text spans has been relatively underexplored in NLP, but it is particularly important for high-value domains such as biomedicine, where obtaining high recall of the latest findings is crucial for practical applications.

Document-level Relation Extraction Reading Comprehension +1

Paper
Add Code

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

no code implementations • 25 Jun 2021 • Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, Hoifung Poon

A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months.

Distributed Computing Self-Supervised Learning

Paper
Add Code

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

1 code implementation • 31 Jul 2020 • Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Ranked #2 on Participant Intervention Comparison Outcome Extraction on EBM-NLP (using extra training data)

Continual Pretraining +11

Paper
Code

ML4H Abstract Track 2019

no code implementations • 5 Feb 2020 • Matthew B. A. McDermott, Emily Alsentzer, Sam Finlayson, Michael Oberst, Fabian Falck, Tristan Naumann, Brett K. Beaulieu-Jones, Adrian V. Dalca

A collection of the accepted abstracts for the Machine Learning for Health (ML4H) workshop at NeurIPS 2019.

BIG-bench Machine Learning

Paper
Add Code

Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation

no code implementations • 4 Dec 2019 • Aparna Balagopalan, Jekaterina Novikova, Matthew B. A. McDermott, Bret Nestor, Tristan Naumann, Marzyeh Ghassemi

We learn mappings from other languages to English and detect aphasia from linguistic characteristics of speech, and show that OT domain adaptation improves aphasia detection over unilingual baselines for French (6% increased F1) and Mandarin (5% increased F1).

Domain Adaptation

Paper
Add Code

Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks

1 code implementation • 2 Aug 2019 • Bret Nestor, Matthew B. A. McDermott, Willie Boag, Gabriela Berner, Tristan Naumann, Michael C. Hughes, Anna Goldenberg, Marzyeh Ghassemi

When training clinical prediction models from electronic health records (EHRs), a key concern should be a model's ability to sustain performance over time when deployed, even as care practices, database systems, and population demographics evolve.

De-identification Length-of-Stay prediction +1

Paper
Code

MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III

2 code implementations • 19 Jul 2019 • Shirly Wang, Matthew B. A. McDermott, Geeticka Chauhan, Michael C. Hughes, Tristan Naumann, Marzyeh Ghassemi

Robust machine learning relies on access to data that can be used with standardized frameworks in important tasks and the ability to develop models whose performance can be reasonably reproduced.

Ranked #3 on Length-of-Stay prediction on MIMIC-III

BIG-bench Machine Learning Length-of-Stay prediction +3

377

Paper
Code

Publicly Available Clinical BERT Embeddings

2 code implementations • WS 2019 • Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months.

De-identification

628

Paper
Code

Generalizability of predictive models for intensive care unit patients

1 code implementation • 6 Dec 2018 • Alistair E. W. Johnson, Tom J. Pollard, Tristan Naumann

A large volume of research has considered the creation of predictive models for clinical data; however, much existing literature reports results using only a single source of data.

Paper
Code

Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation

no code implementations • 30 Nov 2018 • Bret Nestor, Matthew B. A. McDermott, Geeticka Chauhan, Tristan Naumann, Michael C. Hughes, Anna Goldenberg, Marzyeh Ghassemi

Machine learning for healthcare often trains models on de-identified datasets with randomly-shifted calendar dates, ignoring the fact that data were generated under hospital operation practices that change over time.

BIG-bench Machine Learning Mortality Prediction

Paper
Add Code

Machine Learning for Health (ML4H) Workshop at NeurIPS 2018

no code implementations • 17 Nov 2018 • Natalia Antropova, Andrew L. Beam, Brett K. Beaulieu-Jones, Irene Chen, Corey Chivers, Adrian Dalca, Sam Finlayson, Madalina Fiterau, Jason Alan Fries, Marzyeh Ghassemi, Mike Hughes, Bruno Jedynak, Jasvinder S. Kandola, Matthew McDermott, Tristan Naumann, Peter Schulam, Farah Shamout, Alexandre Yahi

This volume represents the accepted submissions from the Machine Learning for Health (ML4H) workshop at the conference on Neural Information Processing Systems (NeurIPS) 2018, held on December 8, 2018 in Montreal, Canada.

BIG-bench Machine Learning

Paper
Add Code

Natural Language Processing for EHR-Based Computational Phenotyping

no code implementations • 13 Jun 2018 • Zexian Zeng, Yu Deng, Xiaoyu Li, Tristan Naumann, Yuan Luo

This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping.

Computational Phenotyping

Paper
Add Code

A Review of Challenges and Opportunities in Machine Learning for Health

no code implementations • 1 Jun 2018 • Marzyeh Ghassemi, Tristan Naumann, Peter Schulam, Andrew L. Beam, Irene Y. Chen, Rajesh Ranganath

Modern electronic health records (EHRs) provide data to answer clinically meaningful questions.

BIG-bench Machine Learning

Paper
Add Code

Towards the Creation of a Large Corpus of Synthetically-Identified Clinical Notes

no code implementations • 7 Mar 2018 • Willie Boag, Tristan Naumann, Peter Szolovits

Clinical notes often describe the most important aspects of a patient's physiology and are therefore critical to medical research.

De-identification

Paper
Add Code

CliNER 2.0: Accessible and Accurate Clinical Concept Extraction

no code implementations • 6 Mar 2018 • Willie Boag, Elena Sergeeva, Saurabh Kulshreshtha, Peter Szolovits, Anna Rumshisky, Tristan Naumann

Clinical notes often describe important aspects of a patient's stay and are therefore critical to medical research.

Clinical Concept Extraction Decision Making +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.