TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentence Completion	HellaSwag	Falcon-7B (0-shot)	Accuracy	76.3	# 46
Sentence Completion	HellaSwag	Falcon-40B (0-shot)	Accuracy	82.7	# 32
Sentence Completion	HellaSwag	Falcon-180B (0-shot)	Accuracy	85.9	# 17
Multi-task Language Understanding	MMLU	Falcon 180B (5-shot)	Average (%)	70.6	# 28
Multi-task Language Understanding	MMLU	Falcon 40B	Average (%)	57.0	# 54
Multi-task Language Understanding	MMLU	Falcon 7B (5-shot)	Average (%)	28.0	# 93

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-falcon-series-of-open-language-models/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=the-falcon-series-of-open-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-falcon-series-of-open-language-models/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=the-falcon-series-of-open-language-models)`

The Falcon Series of Open Language Models

28 Nov 2023 · Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, Guilherme Penedo ·

We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurrently developed models such as LLaMA 2 or Inflection-1. It nears the performance of PaLM-2-Large at a reduced pretraining and inference cost, making it, to our knowledge, one of the three best language models in the world along with GPT-4 and PaLM-2-Large. We report detailed evaluations, as well as a deep dive into the methods and custom tooling employed to pretrain Falcon. Notably, we report on our custom distributed training codebase, allowing us to efficiently pretrain these models on up to 4,096 A100s on cloud AWS infrastructure with limited interconnect. We release a 600B tokens extract of our web dataset, as well as the Falcon-7/40/180B models under a permissive license to foster open-science and accelerate the development of an open ecosystem of large language models.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Multi-task Language Understanding

Sentence Completion

Datasets

GLUE

MMLU

HumanEval

HellaSwag

BoolQ

SuperGLUE

PIQA

RACE

OpenBookQA

WinoGrande

The Pile

COPA

BIG-bench

LAMBADA

ARC (AI2 Reasoning Challenge)

SciQ

Results from the Paper

Edit

Ranked #17 on Sentence Completion on HellaSwag

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentence Completion	HellaSwag	Falcon-7B (0-shot)	Accuracy	76.3	# 46	Compare
Sentence Completion	HellaSwag	Falcon-40B (0-shot)	Accuracy	82.7	# 32	Compare
Sentence Completion	HellaSwag	Falcon-180B (0-shot)	Accuracy	85.9	# 17	Compare
Multi-task Language Understanding	MMLU	Falcon 180B (5-shot)	Average (%)	70.6	# 28	Compare
Multi-task Language Understanding	MMLU	Falcon 40B	Average (%)	57.0	# 54	Compare
Multi-task Language Understanding	MMLU	Falcon 7B (5-shot)	Average (%)	28.0	# 93	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Chinchilla • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • LLaMA • Multi-Head Attention • PaLM • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

The Falcon Series of Open Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove