1 code implementation • ACL 2022 • Valentin Hofmann, Hinrich Schuetze, Janet Pierrehumbert
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the tokenization of pretrained language models (PLMs).
no code implementations • 12 Nov 2024 • Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich Schütze, Janet Pierrehumbert
As expected, rule-based and analogical models explain the predictions of GPT-J equally well for adjectives with regular nominalization patterns.
no code implementations • 25 May 2023 • Isabelle Lorge, Janet Pierrehumbert
In such models, words that are similar in their topical associations but differ in their logical force tend to emerge as semantically close, creating well-known challenges for NLP applications that involve logical reasoning.
no code implementations • ACL 2020 • Valentin Hofmann, Hinrich Sch{\"u}tze, Janet Pierrehumbert
The auto-encoder models MWF in English surprisingly well by combining syntactic and semantic information with associative information from the mental lexicon.
no code implementations • ACL 2020 • Valentin Hofmann, Janet Pierrehumbert, Hinrich Sch{\"u}tze
We present the first study that examines the evolution of morphological families, i. e., sets of morphologically related words such as {``}trump{''}, {``}antitrumpism{''}, and {``}detrumpify{''}, in social media.
no code implementations • WS 2018 • Janet Pierrehumbert, Ramon Granell
Quantifying and predicting morphological productivity is a long-standing challenge in corpus linguistics and psycholinguistics.
no code implementations • LREC 2014 • Peter Baumann, Janet Pierrehumbert
In a case study of two such languages, Tagalog and Zulu, we show that an easily obtainable English wordlist can be deployed to seed a morphological analysis algorithm from a small training set of conversational transcripts.