Arithmetic Reasoning
62 papers with code • 2 benchmarks • 3 datasets
Libraries
Use these libraries to find Arithmetic Reasoning models and implementationsMost implemented papers
LLaMA: Open and Efficient Foundation Language Models
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Llama 2: Open Foundation and Fine-Tuned Chat Models
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
Mistral 7B
We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Our work highlights the potential of seamlessly unifying explicit rule learning via CoNNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.
Large Language Models are Zero-Shot Reasoners
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars.
PAL: Program-aided Language Models
Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.
Reasoning with Language Model Prompting: A Survey
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc.
Batch Prompting: Efficient Inference with Large Language Model APIs
We extensively validate the effectiveness of batch prompting on ten datasets across commonsense QA, arithmetic reasoning, and NLI/NLU: batch prompting significantly~(up to 5x with six samples in batch) reduces the LLM (Codex) inference token and time costs while achieving better or comparable performance.
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains.