Search Results for author: Rob van der Goot

Found 55 papers, 29 papers with code

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

1 code implementation18 May 2023 Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.

Relation Relation Extraction +1

Experimental Standards for Deep Learning in Natural Language Processing Research

1 code implementation13 Apr 2022 Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well.

Skill Extraction from Job Postings using Weak Supervision

1 code implementation16 Sep 2022 Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank

Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching.

Spectral Probing

1 code implementation21 Oct 2022 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Linguistic information is encoded at varying timescales (subwords, phrases, etc.)

Informativeness

Probing for Labeled Dependency Trees

1 code implementation ACL 2022 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Probing has become an important tool for analyzing representations in Natural Language Processing (NLP).

Dependency Parsing Informativeness

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

1 code implementation20 May 2023 Mike Zhang, Rob van der Goot, Barbara Plank

The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification.

De-identification Masked Language Modeling +1

Genre as Weak Supervision for Cross-lingual Dependency Parsing

1 code implementation EMNLP 2021 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection.

Dependency Parsing Sentence

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

1 code implementation EACL 2021 Rob van der Goot, {\"O}zlem {\c{C}}etino{\u{g}}lu

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of many natural language processing tasks on social media.

Lexical Normalization POS +2

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

1 code implementation ACL 2018 Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.

Gender Prediction

Cross-Domain Evaluation of POS Taggers: From Wall Street Journal to Fandom Wiki

1 code implementation27 Apr 2023 Kia Kirstein Hansen, Rob van der Goot

The Wall Street Journal section of the Penn Treebank has been the de-facto standard for evaluating POS taggers for a long time, and accuracies over 97\% have been reported.

POS

Silver Syntax Pre-training for Cross-Domain Relation Extraction

1 code implementation18 May 2023 Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.

Relation Relation Extraction

NNOSE: Nearest Neighbor Occupational Skill Extraction

1 code implementation30 Jan 2024 Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text.

Retrieval

Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models

1 code implementation12 Mar 2024 Charlie Campanella, Rob van der Goot

Across all benchmarks, we observe negative correlations between the metropolitan size and the performance of the LLMS, indicating that smaller regions are indeed underrepresented.

MoNoise: Modeling Noise Using a Modular Normalization System

2 code implementations10 Oct 2017 Rob van der Goot, Gertjan van Noord

We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.

Lexical Normalization Spelling Correction +1

Modeling Input Uncertainty in Neural Network Dependency Parsing

1 code implementation EMNLP 2018 Rob van der Goot, Gertjan van Noord

Recently introduced neural network parsers allow for new approaches to circumvent data sparsity issues by modeling character level information and by exploiting raw data in a semi-supervised setting.

Dependency Parsing Lexical Normalization +1

Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor

1 code implementation23 May 2019 Malvina Nissim, Rik van Noord, Rob van der Goot

However, beside the intrinsic problems with the analogy task as a bias detection tool, in this paper we show that a series of issues related to how analogies have been implemented and used might have yielded a distorted picture of bias in word embeddings.

Bias Detection Word Embeddings

sthruggle at SemEval-2019 Task 5: An Ensemble Approach to Hate Speech Detection

no code implementations SEMEVAL 2019 Aria Nourbakhsh, Frida Vermeer, Gijs Wiltvank, Rob van der Goot

In this paper, we present our approach to detection of hate speech against women and immigrants in tweets for our participation in the SemEval-2019 Task 5.

Hate Speech Detection Word Embeddings

The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions

no code implementations LREC 2016 Joachim Daiber, Rob van der Goot

We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets.

Dependency Parsing Lexical Normalization +2

MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

1 code implementation ACL 2019 Rob van der Goot

In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages.

Lexical Normalization

Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing

no code implementations LREC 2020 Rob van der Goot, Alan Ramponi, Tommaso Caselli, Michele Cafagna, Lorenzo De Mattei

However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data.

Dependency Parsing Lexical Normalization

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

no code implementations1 Jun 2020 Rob van der Goot, Özlem Çetinoğlu

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media.

Language Identification Lexical Normalization +3

Biomedical Event Extraction as Sequence Labeling

no code implementations EMNLP 2020 Alan Ramponi, Rob van der Goot, Rosario Lombardo, Barbara Plank

We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model.

Event Extraction Multi-Task Learning

Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data

1 code implementation22 Feb 2021 Anouck Braggaar, Rob van der Goot

This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian code-switch utterances into Universal Dependencies.

Sentence Sentence segmentation

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

no code implementations EACL (AdaptNLP) 2021 Rob van der Goot, Ahmet Üstün, Barbara Plank

However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks.

Dependency Parsing Lemmatization +1

Challenges in Annotating and Parsing Spoken, Code-switched, Frisian-Dutch Data

1 code implementation EACL (AdaptNLP) 2021 Anouck Braggaar, Rob van der Goot

The best single source treebank (nl_alpino) resulted in an LAS of 54. 7 whereas our data selection outperformed the single best transfer treebank and led to 55. 6 LAS on the test data.

XLM-R

NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets

no code implementations EMNLP (WNUT) 2020 Anders Giovanni Møller, Rob van der Goot, Barbara Plank

With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important.

Task 2

We Need to Talk About train-dev-test Splits

1 code implementation EMNLP 2021 Rob van der Goot

However, the introduction of neural networks in NLP has led to a different use of these standard splits; the development set is now often used for model selection during the training procedure.

Model Selection

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?

no code implementations NAACL (CALCS) 2021 Dana-Maria Iliescu, Rasmus Grand, Sara Qirko, Rob van der Goot

Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs.

Language Identification

CL-MoNoise: Cross-lingual Lexical Normalization

no code implementations EMNLP (WNUT) 2021 Rob van der Goot

In this paper, we are the first to propose a model for cross-lingual normalization, with which we participate in the WNUT 2021 shared task.

Lexical Normalization

Parsing with Pretrained Language Models, Multiple Datasets, and Dataset Embeddings

1 code implementation ACL (TLT, SyntaxFest) 2021 Rob van der Goot, Miryam de Lhoneux

With an increase of dataset availability, the potential for learning from a variety of data sources has increased.

Sort by Structure: Language Model Ranking as Dependency Probing

no code implementations NAACL 2022 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.

Language Modelling Structured Prediction

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

no code implementations9 Oct 2023 Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank

Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades.

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

no code implementations25 Oct 2023 Max Müller-Eberstein, Rob van der Goot, Barbara Plank, Ivan Titov

We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.

Language Modelling Multi-Task Learning

Entity Linking in the Job Market Domain

1 code implementation31 Jan 2024 Mike Zhang, Rob van der Goot, Barbara Plank

In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014).

Entity Linking

Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings

no code implementations8 Feb 2024 Elena Senger, Mike Zhang, Rob van der Goot, Barbara Plank

Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis.

Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.