no code implementations • EMNLP (BlackboxNLP) 2021 • Michael Hanna, David Mareček
The high performance of large pretrained language models (LLMs) such as BERT on NLP tasks has prompted questions about BERT’s linguistic capabilities, and how they differ from humans’.
no code implementations • WMT (EMNLP) 2021 • Michael Hanna, Ondřej Bojar
BERTScore, a recently proposed automatic metric for machine translation quality, uses BERT, a large pre-trained language model to evaluate candidate translations with respect to a gold translation.
1 code implementation • COLING 2022 • Michael Hanna, Federico Pedeni, Alessandro Suglia, Alberto Testoni, Raffaella Bernardi
This paves the way for a systematic way of evaluating embodied AI agents that understand grounded actions.
no code implementations • 26 Mar 2024 • Michael Hanna, Sandro Pezzelle, Yonatan Belinkov
Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size.
1 code implementation • 19 Feb 2024 • Frank Wildenburg, Michael Hanna, Sandro Pezzelle
In this work, we propose a novel Dataset of semantically Underspecified Sentences grouped by Type (DUST) and use it to study whether pre-trained language models (LMs) correctly identify and interpret underspecified sentences.
1 code implementation • 23 Oct 2023 • Michael Hanna, Yonatan Belinkov, Sandro Pezzelle
However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans.
1 code implementation • 19 Oct 2023 • Abhijith Chintam, Rahel Beloch, Willem Zuidema, Michael Hanna, Oskar van der Wal
Language models (LMs) exhibit and amplify many types of undesirable biases learned from the training data, including gender bias.
no code implementations • 17 Oct 2023 • Jaap Jumelet, Michael Hanna, Marianne de Heer Kloots, Anna Langedijk, Charlotte Pouw, Oskar van der Wal
We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track.
1 code implementation • NeurIPS 2023 • Michael Hanna, Ollie Liu, Alexandre Variengien
Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small.