Arithmetic Reasoning

70 papers with code • 2 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Arithmetic Reasoning

Trend	Dataset	Best Model	Paper	Code	Compare
	GSM8K	GPT-4 DUP			See all
	MultiArith	Text-davinci-002 (175B)(zero-shot-cot)			See all

Libraries

Use these libraries to find Arithmetic Reasoning models and implementations

epfllm/megatron-llm

3 papers

467

huggingface/transformers

2 papers

125,334

squeezeailab/squeezellm

2 papers

569

skytliang/multi-agents-debate

2 papers

176

Datasets

Most implemented papers

Most implemented Social Latest No code

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

shizhediao/automate-cot • 24 Feb 2023

However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains.

Paper
Code

Sparks of Artificial General Intelligence: Early experiments with GPT-4

microsoft/guidance • 22 Mar 2023

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Paper
Code

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

agi-edgerunners/llm-adapters • • 4 Apr 2023

The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e. g., ChatDoctor) or instruction data (e. g., Alpaca).

Paper
Code

Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL

holarissun/prompt-oirl • 13 Sep 2023

We identify a previously overlooked objective of query dependency in such optimization and elucidate two ensuing challenges that impede the successful and economical design of prompt optimization techniques.

Paper
Code

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft • • 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Paper
Code

Learning to Reason for Text Generation from Scientific Tables

UKPLab/SciGen • • 16 Apr 2021

In this paper, we introduce SciGen, a new challenge dataset for the task of reasoning-aware data-to-text generation consisting of tables from scientific articles and their corresponding descriptions.

Paper
Code

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

lupantech/InterGPS • • ACL 2021

We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).

Paper
Code

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

lupantech/iconqa • • 25 Oct 2021

Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.

Paper
Code

Self-Consistency Improves Chain of Thought Reasoning in Language Models

lastmile-ai/aiconfig • 21 Mar 2022

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.

Paper
Code

UL2: Unifying Language Learning Paradigms

google-research/google-research • • 10 May 2022

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Paper
Code

Arithmetic Reasoning

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result