11 papers with code • 0 benchmarks • 0 datasets
StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.
These leaderboards are used to track progress in StrategyQA
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly.
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.
In this work, we propose an alternative reasoning scheme, Socratic CoT, that learns a decomposition of the original problem into a sequence of subproblems and uses it to guide the intermediate reasoning steps.
This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge.
We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training.
Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline.