TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Few-Shot Text Classification	RAFT	Human (crowdsourced)	Avg	0.735	# 2
Few-Shot Text Classification	RAFT	Human (crowdsourced)	ADE	0.830	# 1
Few-Shot Text Classification	RAFT	Human (crowdsourced)	B77	0.607	# 2
Few-Shot Text Classification	RAFT	Human (crowdsourced)	NIS	0.857	# 1
Few-Shot Text Classification	RAFT	Human (crowdsourced)	OSE	0.646	# 2
Few-Shot Text Classification	RAFT	Human (crowdsourced)	Over	0.917	# 3
Few-Shot Text Classification	RAFT	Human (crowdsourced)	SOT	0.908	# 2
Few-Shot Text Classification	RAFT	Human (crowdsourced)	SRI	0.468	# 7
Few-Shot Text Classification	RAFT	Human (crowdsourced)	TAI	0.609	# 4
Few-Shot Text Classification	RAFT	Human (crowdsourced)	ToS	0.627	# 2
Few-Shot Text Classification	RAFT	Human (crowdsourced)	TEH	0.722	# 1
Few-Shot Text Classification	RAFT	Human (crowdsourced)	TC	0.897	# 1
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	Avg	0.292	# 9
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	ADE	0.163	# 9
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	B77	0.000	# 8
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	NIS	0.572	# 6
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	OSE	0.323	# 7
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	Over	0.378	# 8
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	SOT	0.628	# 5
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	SRI	0.027	# 8
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	TAI	0.362	# 8
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	ToS	0.164	# 8
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	TEH	0.303	# 9
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	TC	0.290	# 9
Few-Shot Text Classification	RAFT	Plurality-class	Avg	0.331	# 8
Few-Shot Text Classification	RAFT	Plurality-class	ADE	0.446	# 7
Few-Shot Text Classification	RAFT	Plurality-class	B77	0.000	# 8
Few-Shot Text Classification	RAFT	Plurality-class	NIS	0.353	# 9
Few-Shot Text Classification	RAFT	Plurality-class	OSE	0.164	# 9
Few-Shot Text Classification	RAFT	Plurality-class	Over	0.337	# 9
Few-Shot Text Classification	RAFT	Plurality-class	SOT	0.271	# 9
Few-Shot Text Classification	RAFT	Plurality-class	SRI	0.493	# 4
Few-Shot Text Classification	RAFT	Plurality-class	TAI	0.344	# 9
Few-Shot Text Classification	RAFT	Plurality-class	ToS	0.471	# 7
Few-Shot Text Classification	RAFT	Plurality-class	TEH	0.366	# 7
Few-Shot Text Classification	RAFT	Plurality-class	TC	0.391	# 8
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	Avg	0.382	# 7
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	ADE	0.234	# 8
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	B77	0.332	# 3
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	NIS	0.615	# 5
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	OSE	0.360	# 5
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	Over	0.462	# 7
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	SOT	0.644	# 4
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	SRI	0.026	# 9
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	TAI	0.469	# 7
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	ToS	0.122	# 9
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	TEH	0.543	# 4
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	TC	0.400	# 7
Few-Shot Text Classification	RAFT	GPT-2	Avg	0.458	# 6
Few-Shot Text Classification	RAFT	GPT-2	ADE	0.600	# 4
Few-Shot Text Classification	RAFT	GPT-2	B77	0.121	# 6
Few-Shot Text Classification	RAFT	GPT-2	NIS	0.561	# 7
Few-Shot Text Classification	RAFT	GPT-2	OSE	0.245	# 8
Few-Shot Text Classification	RAFT	GPT-2	Over	0.498	# 6
Few-Shot Text Classification	RAFT	GPT-2	SOT	0.380	# 8
Few-Shot Text Classification	RAFT	GPT-2	SRI	0.492	# 6
Few-Shot Text Classification	RAFT	GPT-2	TAI	0.612	# 3
Few-Shot Text Classification	RAFT	GPT-2	ToS	0.498	# 6
Few-Shot Text Classification	RAFT	GPT-2	TEH	0.311	# 8
Few-Shot Text Classification	RAFT	GPT-2	TC	0.723	# 4
Few-Shot Text Classification	RAFT	GPT-Neo	Avg	0.481	# 5
Few-Shot Text Classification	RAFT	GPT-Neo	ADE	0.452	# 6
Few-Shot Text Classification	RAFT	GPT-Neo	B77	0.149	# 5
Few-Shot Text Classification	RAFT	GPT-Neo	NIS	0.408	# 8
Few-Shot Text Classification	RAFT	GPT-Neo	OSE	0.343	# 6
Few-Shot Text Classification	RAFT	GPT-Neo	Over	0.681	# 5
Few-Shot Text Classification	RAFT	GPT-Neo	SOT	0.406	# 7
Few-Shot Text Classification	RAFT	GPT-Neo	SRI	0.493	# 4
Few-Shot Text Classification	RAFT	GPT-Neo	TAI	0.605	# 5
Few-Shot Text Classification	RAFT	GPT-Neo	ToS	0.565	# 4
Few-Shot Text Classification	RAFT	GPT-Neo	TEH	0.554	# 3
Few-Shot Text Classification	RAFT	GPT-Neo	TC	0.636	# 5
Few-Shot Text Classification	RAFT	AdaBoost	Avg	0.514	# 4
Few-Shot Text Classification	RAFT	AdaBoost	ADE	0.543	# 5
Few-Shot Text Classification	RAFT	AdaBoost	B77	0.023	# 7
Few-Shot Text Classification	RAFT	AdaBoost	NIS	0.626	# 4
Few-Shot Text Classification	RAFT	AdaBoost	OSE	0.475	# 3
Few-Shot Text Classification	RAFT	AdaBoost	Over	0.838	# 4
Few-Shot Text Classification	RAFT	AdaBoost	SOT	0.455	# 6
Few-Shot Text Classification	RAFT	AdaBoost	SRI	0.506	# 3
Few-Shot Text Classification	RAFT	AdaBoost	TAI	0.556	# 6
Few-Shot Text Classification	RAFT	AdaBoost	ToS	0.560	# 5
Few-Shot Text Classification	RAFT	AdaBoost	TEH	0.443	# 6
Few-Shot Text Classification	RAFT	AdaBoost	TC	0.625	# 6
Few-Shot Text Classification	RAFT	GPT-3	Avg	0.627	# 3
Few-Shot Text Classification	RAFT	GPT-3	ADE	0.686	# 3
Few-Shot Text Classification	RAFT	GPT-3	B77	0.299	# 4
Few-Shot Text Classification	RAFT	GPT-3	NIS	0.679	# 3
Few-Shot Text Classification	RAFT	GPT-3	OSE	0.431	# 4
Few-Shot Text Classification	RAFT	GPT-3	Over	0.937	# 2
Few-Shot Text Classification	RAFT	GPT-3	SOT	0.769	# 3
Few-Shot Text Classification	RAFT	GPT-3	SRI	0.516	# 1
Few-Shot Text Classification	RAFT	GPT-3	TAI	0.656	# 2
Few-Shot Text Classification	RAFT	GPT-3	ToS	0.574	# 3
Few-Shot Text Classification	RAFT	GPT-3	TEH	0.526	# 5
Few-Shot Text Classification	RAFT	GPT-3	TC	0.821	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/raft-a-real-world-few-shot-text/few-shot-text-classification-on-raft)](https://paperswithcode.com/sota/few-shot-text-classification-on-raft?p=raft-a-real-world-few-shot-text)`

RAFT: A Real-World Few-Shot Text Classification Benchmark

28 Sep 2021 · Neel Alex, Eli Lifland, Lewis Tunstall, Abhishek Thakur, Pegah Maham, C. Jess Riedel, Emmie Hine, Carolyn Ashurst, Paul Sedille, Alexis Carlier, Michael Noetel, Andreas Stuhlmüller ·

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .

PDF Abstract

Code

Add Remove Mark official

oughtinc/raft-baselines official

↳ Quickstart in

Colab

Tasks

Add Remove

Classification

Few-Shot Learning

Few-Shot Text Classification

text-classification

Text Classification

Datasets

Introduced in the Paper:

RAFT

Used in the Paper:

GLUE

MultiNLI test

SuperGLUE BANKING77

TweetEval OneStopEnglish Terms of Service FewGlue Overruling

Results from the Paper

Edit

Ranked #2 on Few-Shot Text Classification on RAFT

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Few-Shot Text Classification	RAFT	Human (crowdsourced)	Avg	0.735	# 2	Compare
			ADE	0.830	# 1	Compare
			B77	0.607	# 2	Compare
			NIS	0.857	# 1	Compare
			OSE	0.646	# 2	Compare
			Over	0.917	# 3	Compare
			SOT	0.908	# 2	Compare
			SRI	0.468	# 7	Compare
			TAI	0.609	# 4	Compare
			ToS	0.627	# 2	Compare
			TEH	0.722	# 1	Compare
			TC	0.897	# 1	Compare
Few-Shot Text Classification	RAFT	GPT-3 zero-shot	Avg	0.292	# 9	Compare
			ADE	0.163	# 9	Compare
			B77	0.000	# 8	Compare
			NIS	0.572	# 6	Compare
			OSE	0.323	# 7	Compare
			Over	0.378	# 8	Compare
			SOT	0.628	# 5	Compare
			SRI	0.027	# 8	Compare
			TAI	0.362	# 8	Compare
			ToS	0.164	# 8	Compare
			TEH	0.303	# 9	Compare
			TC	0.290	# 9	Compare
Few-Shot Text Classification	RAFT	Plurality-class	Avg	0.331	# 8	Compare
			ADE	0.446	# 7	Compare
			B77	0.000	# 8	Compare
			NIS	0.353	# 9	Compare
			OSE	0.164	# 9	Compare
			Over	0.337	# 9	Compare
			SOT	0.271	# 9	Compare
			SRI	0.493	# 4	Compare
			TAI	0.344	# 9	Compare
			ToS	0.471	# 7	Compare
			TEH	0.366	# 7	Compare
			TC	0.391	# 8	Compare
Few-Shot Text Classification	RAFT	BART MNLI zero-shot	Avg	0.382	# 7	Compare
			ADE	0.234	# 8	Compare
			B77	0.332	# 3	Compare
			NIS	0.615	# 5	Compare
			OSE	0.360	# 5	Compare
			Over	0.462	# 7	Compare
			SOT	0.644	# 4	Compare
			SRI	0.026	# 9	Compare
			TAI	0.469	# 7	Compare
			ToS	0.122	# 9	Compare
			TEH	0.543	# 4	Compare
			TC	0.400	# 7	Compare
Few-Shot Text Classification	RAFT	GPT-2	Avg	0.458	# 6	Compare
			ADE	0.600	# 4	Compare
			B77	0.121	# 6	Compare
			NIS	0.561	# 7	Compare
			OSE	0.245	# 8	Compare
			Over	0.498	# 6	Compare
			SOT	0.380	# 8	Compare
			SRI	0.492	# 6	Compare
			TAI	0.612	# 3	Compare
			ToS	0.498	# 6	Compare
			TEH	0.311	# 8	Compare
			TC	0.723	# 4	Compare
Few-Shot Text Classification	RAFT	GPT-Neo	Avg	0.481	# 5	Compare
			ADE	0.452	# 6	Compare
			B77	0.149	# 5	Compare
			NIS	0.408	# 8	Compare
			OSE	0.343	# 6	Compare
			Over	0.681	# 5	Compare
			SOT	0.406	# 7	Compare
			SRI	0.493	# 4	Compare
			TAI	0.605	# 5	Compare
			ToS	0.565	# 4	Compare
			TEH	0.554	# 3	Compare
			TC	0.636	# 5	Compare
Few-Shot Text Classification	RAFT	AdaBoost	Avg	0.514	# 4	Compare
			ADE	0.543	# 5	Compare
			B77	0.023	# 7	Compare
			NIS	0.626	# 4	Compare
			OSE	0.475	# 3	Compare
			Over	0.838	# 4	Compare
			SOT	0.455	# 6	Compare
			SRI	0.506	# 3	Compare
			TAI	0.556	# 6	Compare
			ToS	0.560	# 5	Compare
			TEH	0.443	# 6	Compare
			TC	0.625	# 6	Compare
Few-Shot Text Classification	RAFT	GPT-3	Avg	0.627	# 3	Compare
			ADE	0.686	# 3	Compare
			B77	0.299	# 4	Compare
			NIS	0.679	# 3	Compare
			OSE	0.431	# 4	Compare
			Over	0.937	# 2	Compare
			SOT	0.769	# 3	Compare
			SRI	0.516	# 1	Compare
			TAI	0.656	# 2	Compare
			ToS	0.574	# 3	Compare
			TEH	0.526	# 5	Compare
			TC	0.821	# 3	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

RAFT: A Real-World Few-Shot Text Classification Benchmark

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove