1 code implementation • ACL 2022 • Valentin Hofmann, Hinrich Schuetze, Janet Pierrehumbert
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the tokenization of pretrained language models (PLMs).
1 code implementation • 14 Dec 2022 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
We propose a fully unsupervised method to detect bias in contextualized embeddings.
no code implementations • 24 Oct 2022 • Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze
Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics.
1 code implementation • ACL 2022 • Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich Schütze
We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages.
no code implementations • 16 Mar 2022 • Valentin Hofmann, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, Hinrich Schütze
Evaluation on three tasks, namely fine-tuned as well as zero-shot geolocation prediction and zero-shot prediction of dialect features, shows that geoadaptation is very effective: e. g., we obtain state-of-the-art performance in supervised geolocation prediction and report massive gains over geographically uninformed PLMs on zero-shot geolocation prediction.
1 code implementation • Findings (NAACL) 2022 • Valentin Hofmann, Xiaowen Dong, Janet B. Pierrehumbert, Hinrich Schütze
The increasing polarization of online political discourse calls for computational tools that automatically detect and monitor ideological divides in social media.
1 code implementation • ACL 2021 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words?
1 code implementation • ACL 2021 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts.
no code implementations • ACL 2020 • Valentin Hofmann, Janet Pierrehumbert, Hinrich Sch{\"u}tze
We present the first study that examines the evolution of morphological families, i. e., sets of morphologically related words such as {``}trump{''}, {``}antitrumpism{''}, and {``}detrumpify{''}, in social media.
no code implementations • ACL 2020 • Valentin Hofmann, Hinrich Sch{\"u}tze, Janet Pierrehumbert
The auto-encoder models MWF in English surprisingly well by combining syntactic and semantic information with associative information from the mental lexicon.
1 code implementation • EMNLP 2020 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
Can pretrained language models (PLMs) generate derivationally complex words?