StrategyQA
20 papers with code • 0 benchmarks • 0 datasets
StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.
Source: BIG-bench
Benchmarks
These leaderboards are used to track progress in StrategyQA
Most implemented papers
PaLM: Scaling Language Modeling with Pathways
To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.
Improving Planning with Large Language Models: A Modular Agentic Architecture
To improve planning with LLMs, we propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of the specialized modules mentioned above, each implemented using an LLM.
Training Compute-Optimal Large Language Models
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models.
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly.
Distilling Reasoning Capabilities into Smaller Language Models
In this work, we propose an alternative reasoning scheme, Socratic CoT, that learns a decomposition of the original problem into a sequence of subproblems and uses it to guide the intermediate reasoning steps.
Visconde: Multi-document QA with GPT-3 and Neural Reranking
This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge.