RAFT (Realworld Annotated Few-shot Tasks)

Introduced by Alex et al. in RAFT: A Real-World Few-Shot Text Classification Benchmark

The RAFT benchmark (Realworld Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment.

RAFT is a few-shot classification benchmark that tests language models:

across multiple domains (lit reviews, medical data, tweets, customer interaction, etc.)
on economically valuable classification tasks (someone inherently cares about the task)
with evaluation that mirrors deployment (50 labeled examples per task, info retrieval allowed, hidden test set)

Homepage