Multi-task Language Understanding

32 papers with code • 4 benchmarks • 5 datasets

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-task Language Understanding

Dataset	Best Model	Compare
MMLU	Gemini Ultra ~1760B	See all
MGSM	PaLM 2 (few-shot, k=8, SC)	See all
BBH-nlp	Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)	See all
BBH-alg	code-davinci-002 175B (CoT)	See all

Libraries

Use these libraries to find Multi-task Language Understanding models and implementations

huggingface/transformers

5 papers

125,478

epfllm/megatron-llm

3 papers

469

ggerganov/llama.cpp

2 papers

57,596

codedotal/gpt-code-clippy

2 papers

3,290

See all 10 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Atlas: Few-shot Learning with Retrieval Augmented Language Models

facebookresearch/atlas • • 5 Aug 2022

Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.

Paper
Code

Galactica: A Large Language Model for Science

paperswithcode/galai • • 16 Nov 2022

We believe these results demonstrate the potential for language models as a new interface for science.

Paper
Code

REPLUG: Retrieval-Augmented Black-Box Language Models

intellabs/fastrag • • 30 Jan 2023

We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.

Paper
Code

PaLM 2 Technical Report

eternityyw/tram-benchmark • 17 May 2023

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Paper
Code

Textbooks Are All You Need II: phi-1.5 technical report

knowlab/bi-weekly-paper-presentation • 11 Sep 2023

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1. 3 billion parameter model with Python coding performance close to the state-of-the-art.

Paper
Code

Are Human-generated Demonstrations Necessary for In-context Learning?

ruili33/sec • 26 Sep 2023

In this paper, we raise the fundamental question that whether human-generated demonstrations are necessary for ICL.

Paper
Code

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU

fajri91/indommlu • • 7 Oct 2023

In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia.

Paper
Code

MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models

suu990901/LLaMA-InfoEntropy-Loss • • 30 Oct 2023

Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.

Paper
Code

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

wuhy68/parameter-efficient-moe • • 5 Jan 2024

Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across a wide range of tasks.

Paper
Code

Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

leeroo-ai/leeroo_orchestrator • 25 Jan 2024

In this paper, we propose an architecture to harness the collective knowledge of multiple trained LLMs to create a new state-of-the-art.

Paper
Code

Multi-task Language Understanding

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result