Multi-task Language Understanding

32 papers with code • 4 benchmarks • 5 datasets

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-task Language Understanding

Dataset	Best Model	Compare
MMLU	Gemini Ultra ~1760B	See all
MGSM	PaLM 2 (few-shot, k=8, SC)	See all
BBH-nlp	Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)	See all
BBH-alg	code-davinci-002 175B (CoT)	See all

Libraries

Use these libraries to find Multi-task Language Understanding models and implementations

huggingface/transformers

5 papers

127,011

ggerganov/llama.cpp

2 papers

59,480

codedotal/gpt-code-clippy

2 papers

3,287

ncoop57/gpt-code-clippy

2 papers

3,287

See all 9 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

PaLM: Scaling Language Modeling with Pathways

lucidrains/CoCa-pytorch • • Google Research 2022

To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.

Paper
Code

Scaling Instruction-Finetuned Language Models

google-research/flan • • 20 Oct 2022

We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).

Paper
Code

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

eleutherai/gpt-neox • • BigScience (ACL) 2022

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

Paper
Code

Mistral 7B

mistralai/mistral-src • • 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Paper
Code

Mixtral of Experts

hit-scir/chinese-mixtral-8x7b • • 8 Jan 2024

In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.

Paper
Code

UnifiedQA: Crossing Format Boundaries With a Single QA System

allenai/unifiedqa • • Findings of the Association for Computational Linguistics 2020

As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.

Paper
Code

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma • NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Paper
Code

Training Compute-Optimal Large Language Models

karpathy/llama2.c • • 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Paper
Code

UL2: Unifying Language Learning Paradigms

google-research/google-research • • 10 May 2022

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Paper
Code

Solving Quantitative Reasoning Problems with Language Models

gair-nlp/abel • • 29 Jun 2022

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding.

Paper
Code

Multi-task Language Understanding

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result