no code implementations • WS (NoDaLiDa) 2019 • Andreas Kirkedal, Barbara Plank, Leon Derczynski, Natalie Schluter
Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation.
no code implementations • EMNLP 2020 • Alan Ramponi, Rob van der Goot, Rosario Lombardo, Barbara Plank
We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model.
no code implementations • WS (NoDaLiDa) 2019 • Barbara Plank, Sigrid Klerke
More and more evidence is appearing that integrating symbolic lexical knowledge into neural models aids learning.
no code implementations • EMNLP (WNUT) 2020 • Anders Giovanni Møller, Rob van der Goot, Barbara Plank
With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important.
no code implementations • SemEval (NAACL) 2022 • Barbara Plank
Our submission of a single model for 11 languages on the SemEval Task 11 MultiCoNER shows that a vanilla transformer-CRF with XLM-R_{large} outperforms the more recent RemBERT, ranking 9th from 26 submissions in the multilingual track.
no code implementations • WNUT (ACL) 2021 • Benjamin Olsen, Barbara Plank
In this work, we introduce a new dataset of 5, 000 tweets for finding informative COVID-19 tweets for Danish.
no code implementations • CRAC (ACL) 2021 • Maria Barrett, Hieu Lam, Martin Wu, Ophélie Lacroix, Barbara Plank, Anders Søgaard
Automatic coreference resolution is understudied in Danish even though most of the Danish Dependency Treebank (Buch-Kromann, 2003) is annotated with coreference relations.
no code implementations • ACL (BPPF) 2021 • Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma
Instead, we suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.
no code implementations • CODI 2021 • Cathrine Damgaard, Paulina Toborek, Trine Eriksen, Barbara Plank
In this paper, we introduce a new English corpus to study the problem of understanding indirect answers.
no code implementations • LREC 2022 • Kristian Nørgaard Jensen, Barbara Plank
Fine-tuning general-purpose pre-trained models has become a de-facto standard, also for Vision and Language tasks such as Visual Question Answering (VQA).
no code implementations • LREC 2022 • Rob van der Goot, Max Müller-Eberstein, Barbara Plank
For low-resource syntactic tasks, we observe impacts of segment embedding and multilingual BERT choice.
no code implementations • NAACL (TeachingNLP) 2021 • Barbara Plank
Deep neural networks have revolutionized many fields, including Natural Language Processing.
1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko
This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.
1 code implementation • 13 Mar 2025 • Florian Eichin, Yang Janet Liu, Barbara Plank, Michael A. Hedderich
Discourse understanding is essential for many NLP tasks, yet most existing work remains constrained by framework-dependent discourse representations.
no code implementations • 25 Feb 2025 • Shanshan Xu, T. Y. S. S Santosh, Yanai Elazar, Quirin Vogel, Barbara Plank, Matthias Grabmair
The increased adoption of Large Language Models (LLMs) and their potential to shape public opinion have sparked interest in assessing these models' political leanings.
1 code implementation • 17 Feb 2025 • Leonardo Bertolazzi, Philipp Mondorf, Barbara Plank, Raffaella Bernardi
The ability of large language models (LLMs) to validate their output and identify potential errors is crucial for ensuring robustness and reliability.
1 code implementation • 17 Feb 2025 • Chengyan Wu, Bolei Ma, Yihong Liu, Zheyu Zhang, Ningyuan Deng, Yanshu Li, Baolan Chen, Yi Zhang, Barbara Plank, Yun Xue
Aspect-based sentiment analysis (ABSA) is a crucial task in information extraction and sentiment analysis, aiming to identify aspects with associated sentiment elements in text.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+5
1 code implementation • 12 Jan 2025 • Stephanie Eckman, Bolei Ma, Christoph Kern, Rob Chew, Barbara Plank, Frauke Kreuter
Models trained on crowdsourced labels may not reflect broader population views, because those who work as annotators do not represent the population.
no code implementations • 7 Jan 2025 • Verena Blaschke, Felicia Körner, Barbara Plank
We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties, and compare multiple set-ups: varying the training data (English, Norwegian, or dialectal Norwegian), injecting character-level noise, training on auxiliary tasks, and applying Layer Swapping, a technique in which layers of models fine-tuned on different datasets are assembled into a model.
1 code implementation • 7 Jan 2025 • Xaver Maria Krückl, Verena Blaschke, Barbara Plank
Reliable slot and intent detection (SID) is crucial in natural language understanding for applications like digital assistants.
1 code implementation • 19 Dec 2024 • Elena Senger, Yuri Campbell, Rob van der Goot, Barbara Plank
However, publicly available data and tools for career path prediction are scarce.
1 code implementation • 18 Dec 2024 • Alberto Testoni, Barbara Plank, Raquel Fernández
Ambiguity resolution is key to effective communication.
1 code implementation • 18 Dec 2024 • Beiduo Chen, Siyao Peng, Anna Korhonen, Barbara Plank
Disagreement in human labeling is ubiquitous, and can be captured in human judgment distributions (HJDs).
1 code implementation • 17 Dec 2024 • Bolei Ma, Berk Yoztyurk, Anna-Carolina Haensch, Xinpeng Wang, Markus Herklotz, Frauke Kreuter, Barbara Plank, Matthias Assenmacher
In recent research, large language models (LLMs) have been increasingly used to investigate public opinions.
1 code implementation • 17 Dec 2024 • Robert Litschko, Oliver Kraus, Verena Blaschke, Barbara Plank
A large amount of local and culture-specific knowledge (e. g., people, traditions, food) can only be found in documents written in dialects.
no code implementations • 12 Dec 2024 • Alberto Muñoz-Ortiz, Verena Blaschke, Barbara Plank
We explore the potential of pixel-based models for transfer learning from standard languages to dialects.
no code implementations • 12 Dec 2024 • Anne-Marie Lutgen, Alistair Plum, Christoph Purschke, Barbara Plank
Orthographic variation is very common in Luxembourgish texts due to the absence of a fully-fledged standard variety.
no code implementations • WS 2019 • REYHANEH HASHEMPOUR, Barbara Plank, Aline Villavicencio, Renato Cordeiro de Amorim
Logistic regression (LR), and feed-forward neural networks (FFNN) with back-propagation were used to build models in two different settings: Inter-Lingual (IL) and Cross-Lingual (CL).
no code implementations • 21 Nov 2024 • Lovish Madaan, David Esiobu, Pontus Stenetorp, Barbara Plank, Dieuwke Hupkes
In the recent past, a popular way of evaluating natural language understanding (NLU), was to consider a model's ability to perform natural language inference (NLI) tasks.
1 code implementation • 23 Oct 2024 • Qiqi Chen, Xinpeng Wang, Philipp Mondorf, Michael A. Hedderich, Barbara Plank
In this paper, we analyze the roles of the generator and discriminator separately to better understand the conditions when ToT is beneficial.
no code implementations • 18 Oct 2024 • Ryan Soh-Eun Shim, Barbara Plank
We cross-examine our results against dialectometry methods, and interpret the performance disparity to be due to a bias towards dialects that are more similar to the standard variety in the speech-to-text model examined.
no code implementations • 4 Oct 2024 • Xinpeng Wang, Chengzhi Hu, Paul Röttger, Barbara Plank
We also show that our approach can be used for fine-grained calibration of model safety.
no code implementations • 2 Oct 2024 • Philipp Mondorf, Sondre Wold, Barbara Plank
A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions via subnetworks that can be composed to perform more complex tasks.
1 code implementation • 26 Sep 2024 • Jiawen Wang, Longfei Zuo, Siyao Peng, Barbara Plank
Our 100M-sized fusion models also beat CLIP and BLIP, as well as the much larger 9B-sized multimodal IDEFICS and text-only Llama3 and Gemma2, indicating that multimodal stance detection remains challenging for large language models.
no code implementations • 19 Sep 2024 • Kassem Sabeh, Mouna Kacimi, Johann Gamper, Robert Litschko, Barbara Plank
Product attribute value identification (PAVI) involves automatically identifying attributes and their values from product information, enabling features like product search, recommendation, and comparison.
1 code implementation • 24 Jul 2024 • Anastasiia Sedova, Robert Litschko, Diego Frassinelli, Benjamin Roth, Barbara Plank
This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities.
no code implementations • 1 Jul 2024 • Kassem Sabeh, Robert Litschko, Mouna Kacimi, Barbara Plank, Johann Gamper
The task of Product Attribute and Value Identification (PAVI) involves identifying both attributes and their values from product information.
Ranked #1 on
Attribute Mining
on MAVE
1 code implementation • 26 Jun 2024 • Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni
There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case of proprietary models.
1 code implementation • 25 Jun 2024 • Beiduo Chen, Xinpeng Wang, Siyao Peng, Robert Litschko, Anna Korhonen, Barbara Plank
This study proposes to exploit LLMs to approximate HJDs using a small number of expert labels and explanations.
no code implementations • 24 Jun 2024 • Shijia Zhou, Siyao Peng, Barbara Plank
We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3, 087 entity spans to Wikipedia.
1 code implementation • 18 Jun 2024 • Philipp Mondorf, Barbara Plank
Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie.
no code implementations • 16 Jun 2024 • Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter
This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs.
no code implementations • 28 May 2024 • Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi
Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs).
no code implementations • 3 May 2024 • Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank
While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users.
1 code implementation • 21 Apr 2024 • Elisa Bassignana, Viggo Unmack Gascou, Frida Nøhr Laustsen, Gustav Kristensen, Marie Haahr Petersen, Rob van der Goot, Barbara Plank
Current language models require a lot of training data to obtain high performance.
1 code implementation • 12 Apr 2024 • Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank
We show that the text answers are more robust to question perturbations than the first token probabilities, when the first token answers mismatch the text answers.
no code implementations • 3 Apr 2024 • Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual.
no code implementations • 2 Apr 2024 • Philipp Mondorf, Barbara Plank
Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans.
1 code implementation • 19 Mar 2024 • Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank
Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects.
1 code implementation • 15 Mar 2024 • Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze, Barbara Plank
Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages.
1 code implementation • 9 Mar 2024 • Verena Blaschke, Barbara Kovačić, Siyao Peng, Barbara Plank
This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas.
no code implementations • 4 Mar 2024 • Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank
To fill this gap, we introduce a systematic methodology and a new dataset, VariErr (variation versus error), focusing on the NLI task in English.
no code implementations • 25 Feb 2024 • Joris Baan, Raquel Fernández, Barbara Plank, Wilker Aziz
With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes.
1 code implementation • 22 Feb 2024 • Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank
The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging.
1 code implementation • 20 Feb 2024 • Philipp Mondorf, Barbara Plank
Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments.
no code implementations • 19 Feb 2024 • Verena Blaschke, Christoph Purschke, Hinrich Schütze, Barbara Plank
Natural language processing (NLP) has largely focused on modelling standardized languages.
no code implementations • 11 Feb 2024 • Shanshan Xu, T. Y. S. S Santosh, Oana Ichim, Barbara Plank, Matthias Grabmair
We observe limited alignment with the judge vote distribution.
no code implementations • 8 Feb 2024 • Elena Senger, Mike Zhang, Rob van der Goot, Barbara Plank
Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis.
no code implementations • 5 Feb 2024 • Axel Sorensen, Siyao Peng, Barbara Plank, Rob van der Goot
Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets.
1 code implementation • 3 Feb 2024 • Ekaterina Artemova, Verena Blaschke, Barbara Plank
Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets.
1 code implementation • 2 Feb 2024 • Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank
Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition.
1 code implementation • 31 Jan 2024 • Mike Zhang, Rob van der Goot, Barbara Plank
In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014).
1 code implementation • 30 Jan 2024 • Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank
The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text.
2 code implementations • arXiv 2023 • Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter
We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages.
Ranked #1 on
Named Entity Recognition (NER)
on UNER v1 (Danish)
no code implementations • 25 Oct 2023 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank, Ivan Titov
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
no code implementations • 23 Oct 2023 • Xinpeng Wang, Barbara Plank
We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation.
no code implementations • 18 Oct 2023 • Shengqiang Zhang, Philipp Wicke, Lütfi Kerem Şenel, Luis Figueredo, Abdeldjallil Naceri, Sami Haddadin, Barbara Plank, Hinrich Schütze
The convergence of embodied agents and large language models (LLMs) has brought significant advancements to embodied instruction following.
no code implementations • 18 Oct 2023 • Shanshan Xu, T. Y. S. S Santosh, Oana Ichim, Isabella Risini, Barbara Plank, Matthias Grabmair
Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case's facts supposedly relevant to its outcome.
no code implementations • 9 Oct 2023 • Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank
Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades.
1 code implementation • 4 Sep 2023 • Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova, Barbara Plank
Our results show that the choice of the right AED method and model size is indeed crucial and derive practical recommendations for how to use AED methods to clean instruction-tuning data.
no code implementations • 28 Jul 2023 • Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications.
1 code implementation • 31 May 2023 • Leon Weber, Barbara Plank
This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation.
no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.
1 code implementation • 24 May 2023 • Xinpeng Wang, Leonie Weissweiler, Hinrich Schütze, Barbara Plank
To the best of our knowledge, this is the first work comprehensively evaluating distillation objectives in both settings.
1 code implementation • 20 May 2023 • Mike Zhang, Rob van der Goot, Barbara Plank
The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification.
1 code implementation • 19 May 2023 • Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank
In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways.
1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank
Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.
1 code implementation • 18 May 2023 • Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank
One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.
1 code implementation • 9 May 2023 • Robert Litschko, Ekaterina Artemova, Barbara Plank
Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach.
no code implementations • 28 Apr 2023 • Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio
We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on NLP, instead of both NLP and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements-as training with aggregated labels for subjective NLP tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation.
6 code implementations • 20 Apr 2023 • Verena Blaschke, Hinrich Schütze, Barbara Plank
This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography.
2 code implementations • 19 Apr 2023 • Verena Blaschke, Hinrich Schütze, Barbara Plank
In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages.
1 code implementation • 19 Apr 2023 • Ekaterina Artemova, Barbara Plank
Bilingual word lexicons are crucial tools for multilingual natural language understanding and machine translation tasks, as they facilitate the mapping of words in one language to their synonyms in another language.
Bilingual Lexicon Induction
Natural Language Understanding
+4
1 code implementation • 4 Nov 2022 • Barbara Plank
Human variation in labeling is often considered noise.
1 code implementation • 28 Oct 2022 • Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández
Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i. e., its predictive probabilities are a good indication of how likely a prediction is to be correct.
1 code implementation • 21 Oct 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Linguistic information is encoded at varying timescales (subwords, phrases, etc.)
1 code implementation • 20 Oct 2022 • Elisa Bassignana, Max Müller-Eberstein, Mike Zhang, Barbara Plank
With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori - as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable.
1 code implementation • 17 Oct 2022 • Elisa Bassignana, Barbara Plank
Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups.
no code implementations • 23 Sep 2022 • Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, Narges Mahyar
We identify four key groups of challenges for evaluating visual text analytics approaches (data ambiguity, experimental design, user trust, and "big picture" concerns) and provide suggestions for research opportunities from an interdisciplinary perspective.
1 code implementation • 16 Sep 2022 • Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank
Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching.
no code implementations • NAACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.
1 code implementation • LREC 2022 • Mike Zhang, Kristian Nørgaard Jensen, Barbara Plank
Skill Classification (SC) is the task of classifying job competences from job postings.
1 code implementation • ACL 2022 • Elisa Bassignana, Barbara Plank
Over the last five years, research on Relation Extraction (RE) witnessed extensive progress with many new dataset releases.
1 code implementation • NAACL 2022 • Mike Zhang, Kristian Nørgaard Jensen, Sif Dam Sonniks, Barbara Plank
We introduce a BERT baseline (Devlin et al., 2019).
1 code implementation • 13 Apr 2022 • Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank
The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well.
1 code implementation • ACL 2022 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Probing has become an important tool for analyzing representations in Natural Language Processing (NLP).
1 code implementation • ACL (TLT, SyntaxFest) 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
This work provides the first in-depth analysis of genre in Universal Dependencies (UD).
1 code implementation • EMNLP 2021 • Max Müller-Eberstein, Rob van der Goot, Barbara Plank
Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection.
2 code implementations • Findings (EMNLP) 2021 • Mike Zhang, Barbara Plank
We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling.
no code implementations • SEMEVAL 2021 • Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, Massimo Poesio
Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision.
no code implementations • 8 Jul 2021 • Michael A. Hedderich, Benjamin Roth, Katharina Kann, Barbara Plank, Alex Ratner, Dietrich Klakow
Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021.
no code implementations • NAACL 2021 • Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, Massimo Poesio
Supervised learning assumes that a ground truth label exists.
1 code implementation • COLING 2020 • Barbara Plank, Kristian Nørgaard Jensen, Rob van der Goot
We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER.
1 code implementation • NoDaLiDa 2021 • Kristian Nørgaard Jensen, Mike Zhang, Barbara Plank
We present JobStack, a new corpus for de-identification of personal data in job vacancies on Stackoverflow.
2 code implementations • NAACL 2021 • Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, Barbara Plank
To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
no code implementations • EACL (AdaptNLP) 2021 • Rob van der Goot, Ahmet Üstün, Barbara Plank
However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks.
no code implementations • 10 Dec 2020 • Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein
Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time.
no code implementations • SEMEVAL 2020 • Anders Kaas, Viktor Torp Thomsen, Barbara Plank
We present an ablation study which shows that even though BERT representations are very powerful also for this task, BERT still benefits from being combined with carefully designed task-specific features.
no code implementations • SEMEVAL 2020 • Kristian N{\o}rgaard Jensen, Nicolaj Filrup Rasmussen, Thai Wang, Marco Placenti, Barbara Plank
This paper describes a system that aims at assessing humour intensity in edited news headlines as part of the 7th task of SemEval-2020 on {``}Humor, Emphasis and Sentiment{''}.
1 code implementation • COLING 2020 • Alan Ramponi, Barbara Plank
We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention.
2 code implementations • EACL 2021 • Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf, Barbara Plank
In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings.
no code implementations • 25 May 2020 • Andreas Kirkedal, Marija Stepanović, Barbara Plank
A combination of FT Speech with in-domain language data provides comparable results to models trained specifically on Spr\r{a}kbanken, showing that FT Speech transfers well to this data set.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • LREC 2020 • Alan Ramponi, Barbara Plank, Rosario Lombardo
Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature.
1 code implementation • WS (NoDaLiDa) 2019 • Barbara Plank
Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures.
no code implementations • WS 2019 • Sigrid Klerke, Barbara Plank
Hence, caution is warranted when using gaze data as signal for NLP, as no single view is robust over tasks, modeling choice and gaze corpus.
1 code implementation • WS 2019 • Nils Rethmeier, Barbara Plank
Word embeddings have undoubtedly revolutionized NLP.
no code implementations • ACL 2019 • Claudio Greco, Barbara Plank, Raquel Fernández, Raffaella Bernardi
We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA).
no code implementations • 21 Nov 2018 • Barbara Plank, Sigrid Klerke, Zeljko Agic
In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora.
3 code implementations • NAACL 2019 • Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner, Elia Bruni, Barbara Plank, Raffaella Bernardi, Raquel Fernández
We compare our approach to an alternative system which extends the baseline with reinforcement learning.
1 code implementation • EMNLP 2018 • Barbara Plank, Željko Agić
We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages.
no code implementations • COLING 2018 • Martin Kroon, Masha Medvedeva, Barbara Plank
In this paper we present the results of our participation in the Discriminating between Dutch and Flemish in Subtitles VarDial 2018 shared task.
no code implementations • WS 2018 • Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders S{\o}gaard
Neural part-of-speech (POS) taggers are known to not perform well with little training data.
no code implementations • WS 2018 • Sigrid Klerke, H{\'e}ctor Mart{\'\i}nez Alonso, Barbara Plank
We present our submission to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM).
no code implementations • WS 2018 • Barbara Plank
Written text transmits a good deal of nonverbal information related to the author{'}s identity and social factors, such as age, gender and personality.
1 code implementation • ACL 2018 • Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank
Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.
2 code implementations • ACL 2018 • Sebastian Ruder, Barbara Plank
In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training.
Ranked #3 on
Sentiment Analysis
on Multi-Domain Sentiment Dataset
no code implementations • IJCNLP 2017 • Barbara Plank
We present All-In-1, a simple model for multilingual text classification that does not require any parallel data.
1 code implementation • 26 Oct 2017 • Barbara Plank
We present ALL-IN-1, a simple model for multilingual text classification that does not require any parallel data.
no code implementations • WS 2017 • Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan van Noord, Barbara Plank, Martijn Wieling
In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017.
no code implementations • WS 2017 • Johannes Bjerva, Gintar{\.e} Grigonyt{\.e}, Robert {\"O}stling, Barbara Plank
We present the RUG-SU team{'}s submission at the Native Language Identification Shared Task 2017.
1 code implementation • EMNLP 2017 • Sebastian Ruder, Barbara Plank
Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks.
1 code implementation • WS 2017 • Rob van der Goot, Barbara Plank, Malvina Nissim
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data?
no code implementations • EACL 2017 • {\v{Z}}eljko Agi{\'c}, Barbara Plank, Anders S{\o}gaard
We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data.
no code implementations • WS 2017 • Maria Medvedeva, Martin Kroon, Barbara Plank
We present the results of our participation in the VarDial 4 shared task on discriminating closely related languages.
1 code implementation • EACL 2017 • Héctor Martínez Alonso, Željko Agić, Barbara Plank, Anders Søgaard
We propose UDP, the first training-free parser for Universal Dependencies (UD).
no code implementations • EACL 2017 • Héctor Martínez Alonso, Barbara Plank
Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic.
1 code implementation • COLING 2016 • Chlo{\'e} Braud, Barbara Plank, Anders S{\o}gaard
We experiment with different ways of training LSTM networks to predict RST discourse trees.
Ranked #5 on
Discourse Parsing
on RST-DT
(RST-Parseval (Full) metric)
no code implementations • WS 2016 • Barbara Plank
on which texts can differ from the standard.
no code implementations • 9 Nov 2016 • Barbara Plank, Malvina Nissim
We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task.
1 code implementation • COLING 2016 • Barbara Plank
Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing.
1 code implementation • COLING 2016 • Johannes Bjerva, Barbara Plank, Johan Bos
We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets).
no code implementations • 28 Aug 2016 • Barbara Plank
The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language.
no code implementations • LREC 2016 • Ben Verhoeven, Walter Daelemans, Barbara Plank
Personality profiling is the task of detecting personality traits of authors based on writing style.
3 code implementations • ACL 2016 • Barbara Plank, Anders Søgaard, Yoav Goldberg
Bidirectional long short-term memory (bi-LSTM) networks have recently proven successful for various NLP sequence modeling tasks, but little is known about their reliance to input representations, target languages, data set size, and label noise.
Ranked #4 on
Part-Of-Speech Tagging
on UD
no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.
no code implementations • TACL 2016 • {\v{Z}}eljko Agi{\'c}, Anders Johannsen, Barbara Plank, H{\'e}ctor Mart{\'\i}nez Alonso, Natalie Schluter, Anders S{\o}gaard
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages.
no code implementations • LREC 2014 • Dirk Hovy, Barbara Plank, Anders S{\o}gaard
We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.
no code implementations • LREC 2014 • Olga Uryupina, Barbara Plank, Aliaksei Severyn, Agata Rotondi, Aless Moschitti, ro
In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity.