Arithmetic Reasoning

74 papers with code • 2 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Arithmetic Reasoning

Trend	Dataset	Best Model	Paper	Code	Compare
	GSM8K	GPT-4 DUP			See all
	MultiArith	Text-davinci-002 (175B)(zero-shot-cot)			See all

Libraries

Use these libraries to find Arithmetic Reasoning models and implementations

huggingface/transformers

2 papers

127,207

squeezeailab/squeezellm

2 papers

581

RUCAIBox/LLMBox

2 papers

382

skytliang/multi-agents-debate

2 papers

192

Datasets

Most implemented papers

Most implemented Social Latest No code

Reasoning with Language Model Prompting: A Survey

zjunlp/Prompt4ReasoningPapers • • 19 Dec 2022

Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc.

Paper
Code

Batch Prompting: Efficient Inference with Large Language Model APIs

xlang-ai/batch-prompting • 19 Jan 2023

We extensively validate the effectiveness of batch prompting on ten datasets across commonsense QA, arithmetic reasoning, and NLI/NLU: batch prompting significantly~(up to 5x with six samples in batch) reduces the LLM (Codex) inference token and time costs while achieving better or comparable performance.

Paper
Code

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

shizhediao/automate-cot • 24 Feb 2023

However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains.

Paper
Code

Sparks of Artificial General Intelligence: Early experiments with GPT-4

microsoft/guidance • 22 Mar 2023

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Paper
Code

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

agi-edgerunners/llm-adapters • • 4 Apr 2023

The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e. g., ChatDoctor) or instruction data (e. g., Alpaca).

Paper
Code

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

salesforce/codet5 • • 13 May 2023

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

Paper
Code

Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL

holarissun/prompt-oirl • 13 Sep 2023

We identify a previously overlooked objective of query dependency in such optimization and elucidate two ensuing challenges that impede the successful and economical design of prompt optimization techniques.

Paper
Code

Learning to Reason for Text Generation from Scientific Tables

UKPLab/SciGen • • 16 Apr 2021

In this paper, we introduce SciGen, a new challenge dataset for the task of reasoning-aware data-to-text generation consisting of tables from scientific articles and their corresponding descriptions.

Paper
Code

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

lupantech/InterGPS • • ACL 2021

We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).

Paper
Code

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

lupantech/iconqa • • 25 Oct 2021

Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.

Paper
Code

Arithmetic Reasoning

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result