no code implementations • ACL (ISA, IWCS) 2021 • Ekaterina Saveleva, Volha Petukhova, Marius Mosbach, Dietrich Klakow
We tested the widely used Penn Discourse Tree Bank full parser (Lin et al., 2010) and the state-of-the-art neural network NeuralEDUSeg (Wang et al., 2018) and XLNet (Yang et al., 2019) models on the two-stage discourse segmentation and discourse relation recognition.
no code implementations • RANLP 2021 • Ekaterina Saveleva, Volha Petukhova, Marius Mosbach, Dietrich Klakow
The paper presents a novel discourse-based approach to argument quality assessment defined as a graph classification task, where the depth of reasoning (argumentation) is evident from the number and type of detected discourse units and relations between them.
no code implementations • RANLP 2021 • Marius Mosbach, Irina Stenger, Tania Avgustinova, Bernd Möbius, Dietrich Klakow
We present an extended version of a tool developed for calculating linguistic distances and asymmetries in auditory perception of closely related languages.
no code implementations • WS (NoDaLiDa) 2019 • Yuri Bizzoni, Marius Mosbach, Dietrich Klakow, Stefania Degaetano-Ortlieb
We apply hyperbolic embeddings to trace the dynamics of change of conceptual-semantic relationships in a large diachronic scientific corpus (200 years).
1 code implementation • 23 Jul 2024 • Pin-Jie Lin, Miaoran Zhang, Marius Mosbach, Dietrich Klakow
Identifying beneficial tasks to transfer from is a critical step toward successful intermediate-task transfer learning.
1 code implementation • 18 Jun 2024 • Marius Mosbach, Vagrant Gautam, Tomás Vergara-Browne, Dietrich Klakow, Mor Geva
Despite growing interest in the subfield, a criticism of this work is that it lacks actionable insights and therefore has little impact on NLP.
1 code implementation • 9 Apr 2024 • Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy
We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).
no code implementations • 20 Mar 2024 • Paloma García-de-Herreros, Vagrant Gautam, Philipp Slusallek, Dietrich Klakow, Marius Mosbach
ORCA (Shen et al., 2023) is a recent technique for cross-modal fine-tuning, i. e., applying pre-trained transformer models to modalities beyond their training data.
1 code implementation • 20 Feb 2024 • Miaoran Zhang, Vagrant Gautam, Mingyang Wang, Jesujoba O. Alabi, Xiaoyu Shen, Dietrich Klakow, Marius Mosbach
In-context learning is a popular inference strategy where large language models solve a task using only a few labeled demonstrations without needing any parameter updates.
1 code implementation • 20 Feb 2024 • Jesujoba O. Alabi, Marius Mosbach, Matan Eyal, Dietrich Klakow, Mor Geva
We analyze the operation of transformer language adapters, which are small modules trained on top of a frozen language model to adapt its predictions to new target languages.
no code implementations • 8 Nov 2023 • Julius Steuer, Marius Mosbach, Dietrich Klakow
Research on the cognitive plausibility of language models (LMs) has so far mostly concentrated on modelling psycholinguistic response variables such as reading times, gaze durations and N400/P600 EEG signals, while mostly leaving out the dimension of what Mahowald et al. (2023) described as formal and functional linguistic competence, and developmental plausibility.
1 code implementation • 27 May 2023 • Dawei Zhu, Xiaoyu Shen, Marius Mosbach, Andreas Stephan, Dietrich Klakow
In this paper, we revisit the setup of these approaches and find that the benefits brought by these approaches are significantly overestimated.
1 code implementation • 26 May 2023 • Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, Dietrich Klakow, Yanai Elazar
In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B.
1 code implementation • 4 Aug 2022 • Vilém Zouhar, Marius Mosbach, Dietrich Klakow
We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e. g. concatenation) to obtain a richer context representation for language modelling.
no code implementations • 28 Jul 2022 • Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg
Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.
1 code implementation • NAACL (WOAH) 2022 • Awantee Deshpande, Dana Ruiter, Marius Mosbach, Dietrich Klakow
Analyzing ethnic or religious bias is important for improving fairness, accountability, and transparency of natural language processing models.
1 code implementation • NAACL 2022 • Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, Dietrich Klakow
Learning semantically meaningful sentence embeddings is an open problem in natural language processing.
1 code implementation • COLING 2022 • Jesujoba O. Alabi, David Ifeoluwa Adelani, Marius Mosbach, Dietrich Klakow
Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages.
1 code implementation • SpaNLP (ACL) 2022 • Vilém Zouhar, Marius Mosbach, Miaoran Zhang, Dietrich Klakow
Finally, we show that it is possible to combine PCA with using 1bit per dimension.
no code implementations • AKBC Workshop CSKB 2021 • Vilém Zouhar, Marius Mosbach, Debanjali Biswas, Dietrich Klakow
Many NLP models gain performance by having access to a knowledge base.
1 code implementation • 16 Jun 2021 • Badr M. Abdullah, Marius Mosbach, Iuliia Zaitova, Bernd Möbius, Dietrich Klakow
Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity.
1 code implementation • COLING 2020 • Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr M. Abdullah, Dietrich Klakow
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on.
no code implementations • 28 Oct 2020 • Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow
Building on these recent developments, and with the aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • EMNLP (BlackboxNLP) 2020 • Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow
Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method.
no code implementations • 12 Jul 2020 • Aditya Mogadala, Marius Mosbach, Dietrich Klakow
Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore.
2 code implementations • ICLR 2021 • Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow
Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.
no code implementations • RANLP 2019 • Marius Mosbach, Irina Stenger, Tania Avgustinova, Dietrich Klakow
Languages may be differently distant from each other and their mutual intelligibility may be asymmetric.
no code implementations • 8 Feb 2019 • Kathrin Grosse, Thomas A. Trost, Marius Mosbach, Michael Backes, Dietrich Klakow
Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed.
1 code implementation • 29 Oct 2018 • Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow
Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers.