StrategyQA

13 papers with code • 0 benchmarks • 0 datasets

StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.

Source: BIG-bench

Most implemented papers

PaLM: Scaling Language Modeling with Pathways

lucidrains/CoCa-pytorch Google Research 2022

To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Training Compute-Optimal Large Language Models

karpathy/llama2.c 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

eladsegal/strategyqa 6 Jan 2021

A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly.

Self-Consistency Improves Chain of Thought Reasoning in Language Models

lastmile-ai/aiconfig 21 Mar 2022

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.

Distilling Reasoning Capabilities into Smaller Language Models

kumar-shridhar/distiiling-lm 1 Dec 2022

In this work, we propose an alternative reasoning scheme, Socratic CoT, that learns a decomposition of the original problem into a sequence of subproblems and uses it to guide the intermediate reasoning steps.

Visconde: Multi-document QA with GPT-3 and Neural Reranking

neuralmind-ai/visconde 19 Dec 2022

This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

nardien/kard NeurIPS 2023

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge.

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

timhartill/unseen_questions 2 Aug 2023

We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training.

Tailoring Self-Rationalizers with Multi-Reward Distillation

ink-usc/rationalemultirewarddistillation 6 Nov 2023

Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline.