TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multiple Choice Question Answering (MCQA)	MedMCQA	Meditron-70B (CoT + SC)	Dev Set (Acc-%)	66.0	# 1
Question Answering	MedQA	LLAMA-2 (70B SC CoT)	Accuracy	61.5	# 9
Question Answering	MedQA	LLAMA-2 (70B)	Accuracy	59.2	# 11
Question Answering	MedQA	Meditron-70B (CoT + SC)	Accuracy	70.2	# 7
Question Answering	PubMedQA	Meditron-70B (CoT + SC)	Accuracy	81.6	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meditron-70b-scaling-medical-pretraining-for/multiple-choice-question-answering-mcqa-on-21)](https://paperswithcode.com/sota/multiple-choice-question-answering-mcqa-on-21?p=meditron-70b-scaling-medical-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meditron-70b-scaling-medical-pretraining-for/question-answering-on-pubmedqa)](https://paperswithcode.com/sota/question-answering-on-pubmedqa?p=meditron-70b-scaling-medical-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/meditron-70b-scaling-medical-pretraining-for/question-answering-on-medqa-usmle)](https://paperswithcode.com/sota/question-answering-on-medqa-usmle?p=meditron-70b-scaling-medical-pretraining-for)`

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

27 Nov 2023 · Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut ·

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs.

PDF Abstract

Code

Add Remove Mark official

epfllm/meditron official

1,575

Tasks

Add Remove

Conditional Text Generation

Multiple Choice Question Answering (MCQA)

Question Answering

Datasets

MMLU

TruthfulQA

PubMedQA

MedQA

MedMCQA

Results from the Paper

Add Remove

Ranked #1 on Multiple Choice Question Answering (MCQA) on MedMCQA (Dev Set (Acc-%) metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multiple Choice Question Answering (MCQA)	MedMCQA	Meditron-70B (CoT + SC)	Dev Set (Acc-%)	66.0	# 1	Compare
Question Answering	MedQA	LLAMA-2 (70B SC CoT)	Accuracy	61.5	# 9	Compare
Question Answering	MedQA	LLAMA-2 (70B)	Accuracy	59.2	# 11	Compare
Question Answering	MedQA	Meditron-70B (CoT + SC)	Accuracy	70.2	# 7	Compare
Question Answering	PubMedQA	Meditron-70B (CoT + SC)	Accuracy	81.6	# 1	Compare

Methods

Add Remove

BPE • Cosine Annealing • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • LLaMA • Multi-Head Attention • PaLM • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove