Search Results for author: Rob van der Goot

Found 57 papers, 30 papers with code

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

1 code implementation • COLING (WNUT) 2022 • Marcus Vielsted, Nikolaj Wallenius, Rob van der Goot

Automatically detecting the intent of an utterance is important for various downstream natural language processing tasks.

Dialogue Act Classification Lexical Normalization

Paper
Code

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?

no code implementations • NAACL (CALCS) 2021 • Dana-Maria Iliescu, Rasmus Grand, Sara Qirko, Rob van der Goot

Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs.

Language Identification

Paper
Add Code

Biomedical Event Extraction as Sequence Labeling

no code implementations • EMNLP 2020 • Alan Ramponi, Rob van der Goot, Rosario Lombardo, Barbara Plank

We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model.

Event Extraction Multi-Task Learning

Paper
Add Code

NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets

no code implementations • EMNLP (WNUT) 2020 • Anders Giovanni Møller, Rob van der Goot, Barbara Plank

With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important.

Task 2

Paper
Add Code

We Need to Talk About train-dev-test Splits

1 code implementation • EMNLP 2021 • Rob van der Goot

However, the introduction of neural networks in NLP has led to a different use of these standard splits; the development set is now often used for model selection during the training procedure.

Model Selection

Paper
Code

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings

no code implementations • LREC 2022 • Rob van der Goot, Max Müller-Eberstein, Barbara Plank

For low-resource syntactic tasks, we observe impacts of segment embedding and multilingual BERT choice.

Dependency Parsing Position +1

Paper
Add Code

Challenges in Annotating and Parsing Spoken, Code-switched, Frisian-Dutch Data

1 code implementation • EACL (AdaptNLP) 2021 • Anouck Braggaar, Rob van der Goot

The best single source treebank (nl_alpino) resulted in an LAS of 54. 7 whereas our data selection outperformed the single best transfer treebank and led to 55. 6 LAS on the test data.

XLM-R

Paper
Code

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization

1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko

This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.

Dependency Parsing Lexical Normalization +2

Paper
Code

CL-MoNoise: Cross-lingual Lexical Normalization

no code implementations • EMNLP (WNUT) 2021 • Rob van der Goot

In this paper, we are the first to propose a model for cross-lingual normalization, with which we participate in the WNUT 2021 shared task.

Lexical Normalization

Paper
Add Code

Tafsir Dataset: A Novel Multi-Task Benchmark for Named Entity Recognition and Topic Modeling in Classical Arabic Literature

no code implementations • COLING 2022 • Sajawel Ahmed, Rob van der Goot, Misbahur Rehman, Carl Kruse, Ömer Özsoy, Alexander Mehler, Gemma Roig

Various historical languages, which used to be lingua franca of science and arts, deserve the attention of current NLP research.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

How to Encode Domain Information in Relation Classification

no code implementations • 21 Apr 2024 • Elisa Bassignana, Viggo Unmack Gascou, Frida Nøhr Laustsen, Gustav Kristensen, Marie Haahr Petersen, Rob van der Goot, Barbara Plank

Current language models require a lot of training data to obtain high performance.

Classification Relation +1

Paper
Add Code

Can Humans Identify Domains?

1 code implementation • 2 Apr 2024 • Maria Barrett, Max Müller-Eberstein, Elisa Bassignana, Amalie Brogaard Pauli, Mike Zhang, Rob van der Goot

Textual domain is a crucial property within the Natural Language Processing (NLP) community due to its effects on downstream model performance.

Sentence

Paper
Code

Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models

1 code implementation • 12 Mar 2024 • Charlie Campanella, Rob van der Goot

Across all benchmarks, we observe negative correlations between the metropolitan size and the performance of the LLMS, indicating that smaller regions are indeed underrepresented.

Paper
Code

Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings

no code implementations • 8 Feb 2024 • Elena Senger, Mike Zhang, Rob van der Goot, Barbara Plank

Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis.

Classification

Paper
Add Code

EEVEE: An Easy Annotation Tool for Natural Language Processing

no code implementations • 5 Feb 2024 • Axel Sorensen, Siyao Peng, Barbara Plank, Rob van der Goot

Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets.

text-classification Text Classification

Paper
Add Code

Entity Linking in the Job Market Domain

1 code implementation • 31 Jan 2024 • Mike Zhang, Rob van der Goot, Barbara Plank

In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014).

Entity Linking

Paper
Code

NNOSE: Nearest Neighbor Occupational Skill Extraction

1 code implementation • 30 Jan 2024 • Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text.

Retrieval

Paper
Code

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

no code implementations • 25 Oct 2023 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank, Ivan Titov

We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.

Language Modelling Multi-Task Learning

Paper
Add Code

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

no code implementations • 9 Oct 2023 • Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank

Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades.

Paper
Add Code

Findings of the VarDial Evaluation Campaign 2023

no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri

This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.

Intent Detection

Paper
Add Code

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

1 code implementation • 20 May 2023 • Mike Zhang, Rob van der Goot, Barbara Plank

The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification.

De-identification Masked Language Modeling +1

Paper
Code

Silver Syntax Pre-training for Cross-Domain Relation Extraction

1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.

Relation Relation Extraction

Paper
Code

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.

Relation Relation Extraction +1

Paper
Code

Cross-Domain Evaluation of POS Taggers: From Wall Street Journal to Fandom Wiki

1 code implementation • 27 Apr 2023 • Kia Kirstein Hansen, Rob van der Goot

The Wall Street Journal section of the Penn Treebank has been the de-facto standard for evaluating POS taggers for a long time, and accuracies over 97\% have been reported.

POS

Paper
Code

Spectral Probing

1 code implementation • 21 Oct 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Linguistic information is encoded at varying timescales (subwords, phrases, etc.)

Informativeness

Paper
Code

Skill Extraction from Job Postings using Weak Supervision

1 code implementation • 16 Sep 2022 • Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank

Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching.

Paper
Code

Sort by Structure: Language Model Ranking as Dependency Probing

no code implementations • NAACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.

Language Modelling Structured Prediction

Paper
Add Code

Experimental Standards for Deep Learning in Natural Language Processing Research

1 code implementation • 13 Apr 2022 • Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well.

Paper
Code

Probing for Labeled Dependency Trees

1 code implementation • ACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Probing has become an important tool for analyzing representations in Natural Language Processing (NLP).

Dependency Parsing Informativeness

Paper
Code

How Universal is Genre in Universal Dependencies?

1 code implementation • ACL (TLT, SyntaxFest) 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank

This work provides the first in-depth analysis of genre in Universal Dependencies (UD).

Specificity

Paper
Code

Parsing with Pretrained Language Models, Multiple Datasets, and Dataset Embeddings

1 code implementation • ACL (TLT, SyntaxFest) 2021 • Rob van der Goot, Miryam de Lhoneux

With an increase of dataset availability, the potential for learning from a variety of data sources has increased.

Paper
Code

Genre as Weak Supervision for Cross-lingual Dependency Parsing

1 code implementation • EMNLP 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection.

Dependency Parsing Sentence

Paper
Code

DaN+: Danish Nested Named Entities and Lexical Normalization

1 code implementation • COLING 2020 • Barbara Plank, Kristian Nørgaard Jensen, Rob van der Goot

We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER.

Cross-Lingual Transfer Lexical Normalization +4

Paper
Code

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

2 code implementations • NAACL 2021 • Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, Barbara Plank

To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.

intent-classification Intent Classification +7

316

Paper
Code

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

1 code implementation • EACL 2021 • Rob van der Goot, {\"O}zlem {\c{C}}etino{\u{g}}lu

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of many natural language processing tasks on social media.

Lexical Normalization POS +2

Paper
Code

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

no code implementations • EACL (AdaptNLP) 2021 • Rob van der Goot, Ahmet Üstün, Barbara Plank

However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks.

Dependency Parsing Lemmatization +1

Paper
Add Code

Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data

1 code implementation • 22 Feb 2021 • Anouck Braggaar, Rob van der Goot

This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian code-switch utterances into Universal Dependencies.

Sentence Sentence segmentation

Paper
Code

Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor

no code implementations • CL 2020 • Malvina Nissim, Rik van Noord, Rob van der Goot

Analogies such as man is to king as woman is to X are often used to illustrate the amazing power of word embeddings.

Bias Detection Word Embeddings

Paper
Add Code

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

no code implementations • 1 Jun 2020 • Rob van der Goot, Özlem Çetinoğlu

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media.

Language Identification Lexical Normalization +3

Paper
Add Code

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

2 code implementations • EACL 2021 • Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf, Barbara Plank

In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings.

Dependency Parsing Language Modelling +5

Paper
Code

Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data?

no code implementations • LREC 2020 • Kelly Dekker, Rob van der Goot

With this system, we score 94. 29 accuracy on the test data, compared to 95. 22 when it is trained on human-annotated data.

Lexical Normalization Sentence +1

Paper
Add Code

Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing

no code implementations • LREC 2020 • Rob van der Goot, Alan Ramponi, Tommaso Caselli, Michele Cafagna, Lorenzo De Mattei

However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data.

Dependency Parsing Lexical Normalization

Paper
Add Code

An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media

no code implementations • WS 2019 • Rob van der Goot

Existing natural language processing systems have often been designed with standard texts in mind.

Dependency Parsing Lexical Normalization

Paper
Add Code

Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.

no code implementations • WS 2019 • Ahmet {\"U}st{\"u}n, Rob van der Goot, Gosse Bouma, Gertjan van Noord

This paper describes our submission to SIGMORPHON 2019 Task 2: Morphological analysis and lemmatization in context.

Decoder LEMMA +4

Paper
Add Code

MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

1 code implementation • ACL 2019 • Rob van der Goot

In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages.

Lexical Normalization

Paper
Code

sthruggle at SemEval-2019 Task 5: An Ensemble Approach to Hate Speech Detection

no code implementations • SEMEVAL 2019 • Aria Nourbakhsh, Frida Vermeer, Gijs Wiltvank, Rob van der Goot

In this paper, we present our approach to detection of hate speech against women and immigrants in tweets for our participation in the SemEval-2019 Task 5.

Hate Speech Detection Word Embeddings

Paper
Add Code

Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor

1 code implementation • 23 May 2019 • Malvina Nissim, Rik van Noord, Rob van der Goot

However, beside the intrinsic problems with the analogy task as a bias detection tool, in this paper we show that a series of issues related to how analogies have been implemented and used might have yielded a distorted picture of bias in word embeddings.

Bias Detection Word Embeddings

Paper
Code

Modeling Input Uncertainty in Neural Network Dependency Parsing

1 code implementation • EMNLP 2018 • Rob van der Goot, Gertjan van Noord

Recently introduced neural network parsers allow for new approaches to circumvent data sparsity issues by modeling character level information and by exploiting raw data in a semi-supervised setting.

Dependency Parsing Lexical Normalization +1

Paper
Code

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

1 code implementation • ACL 2018 • Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.

Gender Prediction

Paper
Code

A Taxonomy for In-depth Evaluation of Normalization for User Generated Content

no code implementations • LREC 2018 • Rob van der Goot, Rik van Noord, Gertjan van Noord

Grammatical Error Correction Lexical Normalization +1

Paper
Add Code

Last Words: Sharing Is Caring: The Future of Shared Tasks

no code implementations • CL 2017 • Malvina Nissim, Lasha Abzianidze, Kilian Evang, Rob van der Goot, Hessel Haagsma, Barbara Plank, Martijn Wieling

Paper
Add Code

MoNoise: Modeling Noise Using a Modular Normalization System

2 code implementations • 10 Oct 2017 • Rob van der Goot, Gertjan van Noord

We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.

Ranked #1 on Lexical Normalization on LexNorm

Lexical Normalization Spelling Correction +1

Paper
Code

To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging

1 code implementation • WS 2017 • Rob van der Goot, Barbara Plank, Malvina Nissim

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data?

Part-Of-Speech Tagging POS +1

Paper
Code

Parser Adaptation for Social Media by Integrating Normalization

no code implementations • ACL 2017 • Rob van der Goot, Gertjan van Noord

This work explores different approaches of using normalization for parser adaptation.

Domain Adaptation Named Entity Recognition (NER) +1

Paper
Add Code

The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions

no code implementations • LREC 2016 • Joachim Daiber, Rob van der Goot

We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets.

Dependency Parsing Lexical Normalization +2

Paper
Add Code

ROB: Using Semantic Meaning to Recognize Paraphrases

no code implementations • SEMEVAL 2015 • Rob van der Goot, Gertjan van Noord

Semantic Textual Similarity

Paper
Add Code

The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity

no code implementations • SEMEVAL 2014 • Johannes Bjerva, Johan Bos, Rob van der Goot, Malvina Nissim

Natural Language Inference Semantic Similarity +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.