1 code implementation • COLING (WNUT) 2022 • Marcus Vielsted, Nikolaj Wallenius, Rob van der Goot
Automatically detecting the intent of an utterance is important for various downstream natural language processing tasks.
no code implementations • NAACL (CALCS) 2021 • Dana-Maria Iliescu, Rasmus Grand, Sara Qirko, Rob van der Goot
Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs.
no code implementations • EMNLP 2020 • Alan Ramponi, Rob van der Goot, Rosario Lombardo, Barbara Plank
We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model.
no code implementations • EMNLP (WNUT) 2020 • Anders Giovanni Møller, Rob van der Goot, Barbara Plank
With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important.
1 code implementation • EMNLP 2021 • Rob van der Goot
However, the introduction of neural networks in NLP has led to a different use of these standard splits; the development set is now often used for model selection during the training procedure.
no code implementations • LREC 2022 • Rob van der Goot, Max Müller-Eberstein, Barbara Plank
For low-resource syntactic tasks, we observe impacts of segment embedding and multilingual BERT choice.
1 code implementation • EACL (AdaptNLP) 2021 • Anouck Braggaar, Rob van der Goot
The best single source treebank (nl_alpino) resulted in an LAS of 54. 7 whereas our data selection outperformed the single best transfer treebank and led to 55. 6 LAS on the test data.
1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko
This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.
no code implementations • EMNLP (WNUT) 2021 • Rob van der Goot
In this paper, we are the first to propose a model for cross-lingual normalization, with which we participate in the WNUT 2021 shared task.
no code implementations • COLING 2022 • Sajawel Ahmed, Rob van der Goot, Misbahur Rehman, Carl Kruse, Ömer Özsoy, Alexander Mehler, Gemma Roig
Various historical languages, which used to be lingua franca of science and arts, deserve the attention of current NLP research.
no code implementations • 21 Apr 2024 • Elisa Bassignana, Viggo Unmack Gascou, Frida Nøhr Laustsen, Gustav Kristensen, Marie Haahr Petersen, Rob van der Goot, Barbara Plank
Current language models require a lot of training data to obtain high performance.
1 code implementation • 2 Apr 2024 • Maria Barrett, Max Müller-Eberstein, Elisa Bassignana, Amalie Brogaard Pauli, Mike Zhang, Rob van der Goot
Textual domain is a crucial property within the Natural Language Processing (NLP) community due to its effects on downstream model performance.
1 code implementation • 12 Mar 2024 • Charlie Campanella, Rob van der Goot
Across all benchmarks, we observe negative correlations between the metropolitan size and the performance of the LLMS, indicating that smaller regions are indeed underrepresented.
no code implementations • 8 Feb 2024 • Elena Senger, Mike Zhang, Rob van der Goot, Barbara Plank
Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis.
no code implementations • 5 Feb 2024 • Axel Sorensen, Siyao Peng, Barbara Plank, Rob van der Goot
Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets.
1 code implementation • 31 Jan 2024 • Mike Zhang, Rob van der Goot, Barbara Plank
In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014).
1 code implementation • 30 Jan 2024 • Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank
The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text.
no code implementations • 25 Oct 2023 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank, Ivan Titov
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
no code implementations • 9 Oct 2023 • Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank
Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades.
no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.
1 code implementation • 20 May 2023 • Mike Zhang, Rob van der Goot, Barbara Plank
The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification.
1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank
One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.
1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank
Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.
1 code implementation • 27 Apr 2023 • Kia Kirstein Hansen, Rob van der Goot
The Wall Street Journal section of the Penn Treebank has been the de-facto standard for evaluating POS taggers for a long time, and accuracies over 97\% have been reported.
1 code implementation • 21 Oct 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Linguistic information is encoded at varying timescales (subwords, phrases, etc.)
1 code implementation • 16 Sep 2022 • Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank
Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching.
no code implementations • NAACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.
1 code implementation • 13 Apr 2022 • Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank
The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well.
1 code implementation • ACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Probing has become an important tool for analyzing representations in Natural Language Processing (NLP).
1 code implementation • ACL (TLT, SyntaxFest) 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
This work provides the first in-depth analysis of genre in Universal Dependencies (UD).
1 code implementation • ACL (TLT, SyntaxFest) 2021 • Rob van der Goot, Miryam de Lhoneux
With an increase of dataset availability, the potential for learning from a variety of data sources has increased.
1 code implementation • EMNLP 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection.
1 code implementation • COLING 2020 • Barbara Plank, Kristian Nørgaard Jensen, Rob van der Goot
We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER.
2 code implementations • NAACL 2021 • Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, Barbara Plank
To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
1 code implementation • EACL 2021 • Rob van der Goot, {\"O}zlem {\c{C}}etino{\u{g}}lu
Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of many natural language processing tasks on social media.
no code implementations • EACL (AdaptNLP) 2021 • Rob van der Goot, Ahmet Üstün, Barbara Plank
However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks.
1 code implementation • 22 Feb 2021 • Anouck Braggaar, Rob van der Goot
This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian code-switch utterances into Universal Dependencies.
no code implementations • CL 2020 • Malvina Nissim, Rik van Noord, Rob van der Goot
Analogies such as man is to king as woman is to X are often used to illustrate the amazing power of word embeddings.
no code implementations • 1 Jun 2020 • Rob van der Goot, Özlem Çetinoğlu
Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media.
2 code implementations • EACL 2021 • Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf, Barbara Plank
In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings.
no code implementations • LREC 2020 • Kelly Dekker, Rob van der Goot
With this system, we score 94. 29 accuracy on the test data, compared to 95. 22 when it is trained on human-annotated data.
no code implementations • LREC 2020 • Rob van der Goot, Alan Ramponi, Tommaso Caselli, Michele Cafagna, Lorenzo De Mattei
However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data.
no code implementations • WS 2019 • Rob van der Goot
Existing natural language processing systems have often been designed with standard texts in mind.
no code implementations • WS 2019 • Ahmet {\"U}st{\"u}n, Rob van der Goot, Gosse Bouma, Gertjan van Noord
This paper describes our submission to SIGMORPHON 2019 Task 2: Morphological analysis and lemmatization in context.
1 code implementation • ACL 2019 • Rob van der Goot
In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages.
no code implementations • SEMEVAL 2019 • Aria Nourbakhsh, Frida Vermeer, Gijs Wiltvank, Rob van der Goot
In this paper, we present our approach to detection of hate speech against women and immigrants in tweets for our participation in the SemEval-2019 Task 5.
1 code implementation • 23 May 2019 • Malvina Nissim, Rik van Noord, Rob van der Goot
However, beside the intrinsic problems with the analogy task as a bias detection tool, in this paper we show that a series of issues related to how analogies have been implemented and used might have yielded a distorted picture of bias in word embeddings.
1 code implementation • EMNLP 2018 • Rob van der Goot, Gertjan van Noord
Recently introduced neural network parsers allow for new approaches to circumvent data sparsity issues by modeling character level information and by exploiting raw data in a semi-supervised setting.
1 code implementation • ACL 2018 • Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank
Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.
2 code implementations • 10 Oct 2017 • Rob van der Goot, Gertjan van Noord
We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.
Ranked #1 on Lexical Normalization on LexNorm
1 code implementation • WS 2017 • Rob van der Goot, Barbara Plank, Malvina Nissim
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data?
no code implementations • ACL 2017 • Rob van der Goot, Gertjan van Noord
This work explores different approaches of using normalization for parser adaptation.
no code implementations • LREC 2016 • Joachim Daiber, Rob van der Goot
We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets.