no code implementations • *SEM (NAACL) 2022 • Alessandro Stolfo, Chris Tanner, Vikram Gupta, Mrinmaya Sachan
Labeled data for the task of Coreference Resolution is a scarce resource, requiring significant human effort.
no code implementations • 10 Apr 2024 • Alessandro Stolfo
We present an empirical study of groundedness in long-form question answering (LFQA) by retrieval-augmented large language models (LLMs).
no code implementations • 31 Jan 2024 • Andreas Opedal, Alessandro Stolfo, Haruki Shirakami, Ying Jiao, Ryan Cotterell, Bernhard Schölkopf, Abulhair Saparov, Mrinmaya Sachan
We find evidence that LLMs, with and without instruction-tuning, exhibit human-like biases in both the text-comprehension and the solution-planning steps of the solving process, but not during the final step which relies on the problem's arithmetic expressions (solution execution).
1 code implementation • 23 Oct 2023 • Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.
1 code implementation • 24 May 2023 • Alessandro Stolfo, Yonatan Belinkov, Mrinmaya Sachan
Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture.
1 code implementation • 1 Dec 2022 • Kumar Shridhar, Alessandro Stolfo, Mrinmaya Sachan
In this work, we propose an alternative reasoning scheme, Socratic CoT, that learns a decomposition of the original problem into a sequence of subproblems and uses it to guide the intermediate reasoning steps.
1 code implementation • 21 Oct 2022 • Alessandro Stolfo, Zhijing Jin, Kumar Shridhar, Bernhard Schölkopf, Mrinmaya Sachan
By grounding the behavioral analysis in a causal graph describing an intuitive reasoning process, we study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
1 code implementation • 7 Oct 2022 • Kumar Shridhar, Nicholas Monath, Raghuveer Thirukovalluru, Alessandro Stolfo, Manzil Zaheer, Andrew McCallum, Mrinmaya Sachan
Ontonotes has served as the most important benchmark for coreference resolution.