no code implementations • ACL (BPPF) 2021 • Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma
Instead, we suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.
no code implementations • NAACL (TeachingNLP) 2021 • Barbara Plank
Deep neural networks have revolutionized many fields, including Natural Language Processing.
no code implementations • SemEval (NAACL) 2022 • Barbara Plank
Our submission of a single model for 11 languages on the SemEval Task 11 MultiCoNER shows that a vanilla transformer-CRF with XLM-R_{large} outperforms the more recent RemBERT, ranking 9th from 26 submissions in the multilingual track.
1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko
This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.
no code implementations • CODI 2021 • Cathrine Damgaard, Paulina Toborek, Trine Eriksen, Barbara Plank
In this paper, we introduce a new English corpus to study the problem of understanding indirect answers.
no code implementations • EMNLP 2020 • Alan Ramponi, Rob van der Goot, Rosario Lombardo, Barbara Plank
We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model.
no code implementations • LREC 2022 • Kristian Nørgaard Jensen, Barbara Plank
Fine-tuning general-purpose pre-trained models has become a de-facto standard, also for Vision and Language tasks such as Visual Question Answering (VQA).
no code implementations • LREC 2022 • Rob van der Goot, Max Müller-Eberstein, Barbara Plank
For low-resource syntactic tasks, we observe impacts of segment embedding and multilingual BERT choice.
no code implementations • WNUT (ACL) 2021 • Benjamin Olsen, Barbara Plank
In this work, we introduce a new dataset of 5, 000 tweets for finding informative COVID-19 tweets for Danish.
no code implementations • CRAC (ACL) 2021 • Maria Barrett, Hieu Lam, Martin Wu, Ophélie Lacroix, Barbara Plank, Anders Søgaard
Automatic coreference resolution is understudied in Danish even though most of the Danish Dependency Treebank (Buch-Kromann, 2003) is annotated with coreference relations.
no code implementations • WS (NoDaLiDa) 2019 • Andreas Kirkedal, Barbara Plank, Leon Derczynski, Natalie Schluter
Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation.
no code implementations • WS (NoDaLiDa) 2019 • Barbara Plank, Sigrid Klerke
More and more evidence is appearing that integrating symbolic lexical knowledge into neural models aids learning.
no code implementations • EMNLP (WNUT) 2020 • Anders Giovanni Møller, Rob van der Goot, Barbara Plank
With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important.
no code implementations • 4 Sep 2023 • Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova, Barbara Plank
To gain insights, we provide a first case-study to examine how the quality of the instruction-tuning datasets influences downstream performance.
no code implementations • 28 Jul 2023 • Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications.
1 code implementation • 31 May 2023 • Leon Weber, Barbara Plank
This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation.
no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.
1 code implementation • 24 May 2023 • Xinpeng Wang, Leonie Weissweiler, Hinrich Schütze, Barbara Plank
To the best of our knowledge, this is the first work comprehensively evaluating distillation objectives in both settings.
1 code implementation • 20 May 2023 • Mike Zhang, Rob van der Goot, Barbara Plank
The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification.
1 code implementation • 19 May 2023 • Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank
In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways.
1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank
Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.
1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank
One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.
1 code implementation • 9 May 2023 • Robert Litschko, Ekaterina Artemova, Barbara Plank
Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach.
no code implementations • 28 Apr 2023 • Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio
We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on NLP, instead of both NLP and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements-as training with aggregated labels for subjective NLP tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation.
5 code implementations • 20 Apr 2023 • Verena Blaschke, Hinrich Schütze, Barbara Plank
This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography.
2 code implementations • 19 Apr 2023 • Verena Blaschke, Hinrich Schütze, Barbara Plank
In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages.
1 code implementation • 19 Apr 2023 • Ekaterina Artemova, Barbara Plank
Bilingual word lexicons are crucial tools for multilingual natural language understanding and machine translation tasks, as they facilitate the mapping of words in one language to their synonyms in another language.
Bilingual Lexicon Induction
Natural Language Understanding
+3
1 code implementation • 4 Nov 2022 • Barbara Plank
Human variation in labeling is often considered noise.
1 code implementation • 28 Oct 2022 • Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández
Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i. e., its predictive probabilities are a good indication of how likely a prediction is to be correct.
1 code implementation • 21 Oct 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Linguistic information is encoded at varying timescales (subwords, phrases, etc.)
1 code implementation • 20 Oct 2022 • Elisa Bassignana, Max Müller-Eberstein, Mike Zhang, Barbara Plank
With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori - as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable.
1 code implementation • 17 Oct 2022 • Elisa Bassignana, Barbara Plank
Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups.
no code implementations • 23 Sep 2022 • Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, Narges Mahyar
We identify four key groups of challenges for evaluating visual text analytics approaches (data ambiguity, experimental design, user trust, and "big picture" concerns) and provide suggestions for research opportunities from an interdisciplinary perspective.
1 code implementation • 16 Sep 2022 • Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank
Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching.
no code implementations • NAACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.
2 code implementations • LREC 2022 • Mike Zhang, Kristian Nørgaard Jensen, Barbara Plank
Skill Classification (SC) is the task of classifying job competences from job postings.
1 code implementation • ACL 2022 • Elisa Bassignana, Barbara Plank
Over the last five years, research on Relation Extraction (RE) witnessed extensive progress with many new dataset releases.
2 code implementations • NAACL 2022 • Mike Zhang, Kristian Nørgaard Jensen, Sif Dam Sonniks, Barbara Plank
We introduce a BERT baseline (Devlin et al., 2019).
1 code implementation • 13 Apr 2022 • Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank
The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well.
1 code implementation • ACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Probing has become an important tool for analyzing representations in Natural Language Processing (NLP).
1 code implementation • ACL (TLT, SyntaxFest) 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
This work provides the first in-depth analysis of genre in Universal Dependencies (UD).
1 code implementation • EMNLP 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection.
2 code implementations • Findings (EMNLP) 2021 • Mike Zhang, Barbara Plank
We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling.
no code implementations • SEMEVAL 2021 • Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, Massimo Poesio
Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision.
no code implementations • 8 Jul 2021 • Michael A. Hedderich, Benjamin Roth, Katharina Kann, Barbara Plank, Alex Ratner, Dietrich Klakow
Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021.
no code implementations • NAACL 2021 • Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, Massimo Poesio
Supervised learning assumes that a ground truth label exists.
1 code implementation • COLING 2020 • Barbara Plank, Kristian Nørgaard Jensen, Rob van der Goot
We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER.
1 code implementation • NoDaLiDa 2021 • Kristian Nørgaard Jensen, Mike Zhang, Barbara Plank
We present JobStack, a new corpus for de-identification of personal data in job vacancies on Stackoverflow.
2 code implementations • NAACL 2021 • Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, Barbara Plank
To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
no code implementations • EACL (AdaptNLP) 2021 • Rob van der Goot, Ahmet Üstün, Barbara Plank
However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks.
no code implementations • 10 Dec 2020 • Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein
Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time.
no code implementations • SEMEVAL 2020 • Anders Kaas, Viktor Torp Thomsen, Barbara Plank
We present an ablation study which shows that even though BERT representations are very powerful also for this task, BERT still benefits from being combined with carefully designed task-specific features.
no code implementations • SEMEVAL 2020 • Kristian N{\o}rgaard Jensen, Nicolaj Filrup Rasmussen, Thai Wang, Marco Placenti, Barbara Plank
This paper describes a system that aims at assessing humour intensity in edited news headlines as part of the 7th task of SemEval-2020 on {``}Humor, Emphasis and Sentiment{''}.
1 code implementation • COLING 2020 • Alan Ramponi, Barbara Plank
We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention.
Out-of-Distribution Generalization
Unsupervised Domain Adaptation
2 code implementations • EACL 2021 • Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf, Barbara Plank
In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings.
no code implementations • 25 May 2020 • Andreas Kirkedal, Marija Stepanović, Barbara Plank
A combination of FT Speech with in-domain language data provides comparable results to models trained specifically on Spr\r{a}kbanken, showing that FT Speech transfers well to this data set.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • LREC 2020 • Alan Ramponi, Barbara Plank, Rosario Lombardo
Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature.
1 code implementation • WS (NoDaLiDa) 2019 • Barbara Plank
Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures.
no code implementations • WS 2019 • Sigrid Klerke, Barbara Plank
Hence, caution is warranted when using gaze data as signal for NLP, as no single view is robust over tasks, modeling choice and gaze corpus.
1 code implementation • WS 2019 • Nils Rethmeier, Barbara Plank
Word embeddings have undoubtedly revolutionized NLP.
no code implementations • ACL 2019 • Claudio Greco, Barbara Plank, Raquel Fernández, Raffaella Bernardi
We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA).
no code implementations • 21 Nov 2018 • Barbara Plank, Sigrid Klerke, Zeljko Agic
In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora.
3 code implementations • NAACL 2019 • Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner, Elia Bruni, Barbara Plank, Raffaella Bernardi, Raquel Fernández
We compare our approach to an alternative system which extends the baseline with reinforcement learning.
1 code implementation • EMNLP 2018 • Barbara Plank, Željko Agić
We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages.
no code implementations • COLING 2018 • Martin Kroon, Masha Medvedeva, Barbara Plank
In this paper we present the results of our participation in the Discriminating between Dutch and Flemish in Subtitles VarDial 2018 shared task.
no code implementations • WS 2018 • Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders S{\o}gaard
Neural part-of-speech (POS) taggers are known to not perform well with little training data.
no code implementations • WS 2018 • Sigrid Klerke, H{\'e}ctor Mart{\'\i}nez Alonso, Barbara Plank
We present our submission to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM).
no code implementations • WS 2018 • Barbara Plank
Written text transmits a good deal of nonverbal information related to the author{'}s identity and social factors, such as age, gender and personality.
1 code implementation • ACL 2018 • Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank
Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.
2 code implementations • ACL 2018 • Sebastian Ruder, Barbara Plank
In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training.
Ranked #3 on
Sentiment Analysis
on Multi-Domain Sentiment Dataset
no code implementations • IJCNLP 2017 • Barbara Plank
We present All-In-1, a simple model for multilingual text classification that does not require any parallel data.
1 code implementation • 26 Oct 2017 • Barbara Plank
We present ALL-IN-1, a simple model for multilingual text classification that does not require any parallel data.
no code implementations • WS 2017 • Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan van Noord, Barbara Plank, Martijn Wieling
In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017.
no code implementations • WS 2017 • Johannes Bjerva, Gintar{\.e} Grigonyt{\.e}, Robert {\"O}stling, Barbara Plank
We present the RUG-SU team{'}s submission at the Native Language Identification Shared Task 2017.
1 code implementation • EMNLP 2017 • Sebastian Ruder, Barbara Plank
Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks.
1 code implementation • WS 2017 • Rob van der Goot, Barbara Plank, Malvina Nissim
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data?
no code implementations • EACL 2017 • {\v{Z}}eljko Agi{\'c}, Barbara Plank, Anders S{\o}gaard
We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data.
no code implementations • WS 2017 • Maria Medvedeva, Martin Kroon, Barbara Plank
We present the results of our participation in the VarDial 4 shared task on discriminating closely related languages.
1 code implementation • EACL 2017 • Héctor Martínez Alonso, Željko Agić, Barbara Plank, Anders Søgaard
We propose UDP, the first training-free parser for Universal Dependencies (UD).
no code implementations • EACL 2017 • Héctor Martínez Alonso, Barbara Plank
Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic.
1 code implementation • COLING 2016 • Chlo{\'e} Braud, Barbara Plank, Anders S{\o}gaard
We experiment with different ways of training LSTM networks to predict RST discourse trees.
Ranked #10 on
Discourse Parsing
on RST-DT
no code implementations • WS 2016 • Barbara Plank
on which texts can differ from the standard.
no code implementations • 9 Nov 2016 • Barbara Plank, Malvina Nissim
We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task.
1 code implementation • COLING 2016 • Barbara Plank
Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing.
1 code implementation • COLING 2016 • Johannes Bjerva, Barbara Plank, Johan Bos
We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets).
no code implementations • 28 Aug 2016 • Barbara Plank
The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language.
no code implementations • LREC 2016 • Ben Verhoeven, Walter Daelemans, Barbara Plank
Personality profiling is the task of detecting personality traits of authors based on writing style.
3 code implementations • ACL 2016 • Barbara Plank, Anders Søgaard, Yoav Goldberg
Bidirectional long short-term memory (bi-LSTM) networks have recently proven successful for various NLP sequence modeling tasks, but little is known about their reliance to input representations, target languages, data set size, and label noise.
Ranked #4 on
Part-Of-Speech Tagging
on UD
no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.
no code implementations • TACL 2016 • {\v{Z}}eljko Agi{\'c}, Anders Johannsen, Barbara Plank, H{\'e}ctor Mart{\'\i}nez Alonso, Natalie Schluter, Anders S{\o}gaard
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages.
no code implementations • LREC 2014 • Dirk Hovy, Barbara Plank, Anders S{\o}gaard
We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.
no code implementations • LREC 2014 • Olga Uryupina, Barbara Plank, Aliaksei Severyn, Agata Rotondi, Aless Moschitti, ro
In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity.