Search Results for author: Barbara Plank

Found 118 papers, 45 papers with code

We Need to Consider Disagreement in Evaluation

no code implementations ACL (BPPF) 2021 Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma

Instead, we suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.

From back to the roots into the gated woods: Deep learning for NLP

no code implementations NAACL (TeachingNLP) 2021 Barbara Plank

Deep neural networks have revolutionized many fields, including Natural Language Processing.

Sliced at SemEval-2022 Task 11: Bigger, Better? Massively Multilingual LMs for Multilingual Complex NER on an Academic GPU Budget

no code implementations SemEval (NAACL) 2022 Barbara Plank

Our submission of a single model for 11 languages on the SemEval Task 11 MultiCoNER shows that a vanilla transformer-CRF with XLM-R_{large} outperforms the more recent RemBERT, ranking 9th from 26 submissions in the multilingual track.


Biomedical Event Extraction as Sequence Labeling

no code implementations EMNLP 2020 Alan Ramponi, Rob van der Goot, Rosario Lombardo, Barbara Plank

We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model.

Event Extraction Multi-Task Learning

Finding the needle in a haystack: Extraction of Informative COVID-19 Danish Tweets

no code implementations WNUT (ACL) 2021 Benjamin Olsen, Barbara Plank

In this work, we introduce a new dataset of 5, 000 tweets for finding informative COVID-19 tweets for Danish.

Resources and Evaluations for Danish Entity Resolution

no code implementations CRAC (ACL) 2021 Maria Barrett, Hieu Lam, Martin Wu, Ophélie Lacroix, Barbara Plank, Anders Søgaard

Automatic coreference resolution is understudied in Danish even though most of the Danish Dependency Treebank (Buch-Kromann, 2003) is annotated with coreference relations.

coreference-resolution Entity Disambiguation +2

The Lacunae of Danish Natural Language Processing

no code implementations WS (NoDaLiDa) 2019 Andreas Kirkedal, Barbara Plank, Leon Derczynski, Natalie Schluter

Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation.

Lexical Resources for Low-Resource PoS Tagging in Neural Times

no code implementations WS (NoDaLiDa) 2019 Barbara Plank, Sigrid Klerke

More and more evidence is appearing that integrating symbolic lexical knowledge into neural models aids learning.

Cross-Lingual POS Tagging POS

NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets

no code implementations EMNLP (WNUT) 2020 Anders Giovanni Møller, Rob van der Goot, Barbara Plank

With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important.

Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

no code implementations4 Sep 2023 Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova, Barbara Plank

To gain insights, we provide a first case-study to examine how the quality of the instruction-tuning datasets influences downstream performance.

Uncertainty in Natural Language Generation: From Theory to Applications

no code implementations28 Jul 2023 Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz

Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications.

Active Learning Text Generation

ActiveAED: A Human in the Loop Improves Annotation Error Detection

1 code implementation31 May 2023 Leon Weber, Barbara Plank

This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation.

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

1 code implementation20 May 2023 Mike Zhang, Rob van der Goot, Barbara Plank

The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification.

De-identification Masked Language Modeling +1

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

1 code implementation19 May 2023 Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank

In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways.

Text Generation

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

1 code implementation18 May 2023 Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.

Relation Extraction Translation

Silver Syntax Pre-training for Cross-Domain Relation Extraction

1 code implementation18 May 2023 Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.

Relation Extraction

Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

1 code implementation9 May 2023 Robert Litschko, Ekaterina Artemova, Barbara Plank

Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach.

Cross-Lingual Word Embeddings Information Retrieval +2

SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

no code implementations28 Apr 2023 Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio

We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on NLP, instead of both NLP and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements-as training with aggregated labels for subjective NLP tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation.

Sentiment Analysis

Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages

5 code implementations20 Apr 2023 Verena Blaschke, Hinrich Schütze, Barbara Plank

This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography.

Cross-Lingual Transfer Part-Of-Speech Tagging +1

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

2 code implementations19 Apr 2023 Verena Blaschke, Hinrich Schütze, Barbara Plank

In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages.

Low-resource Bilingual Dialect Lexicon Induction with Large Language Models

1 code implementation19 Apr 2023 Ekaterina Artemova, Barbara Plank

Bilingual word lexicons are crucial tools for multilingual natural language understanding and machine translation tasks, as they facilitate the mapping of words in one language to their synonyms in another language.

Bilingual Lexicon Induction Natural Language Understanding +3

Stop Measuring Calibration When Humans Disagree

1 code implementation28 Oct 2022 Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández

Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i. e., its predictive probabilities are a good indication of how likely a prediction is to be correct.

Spectral Probing

1 code implementation21 Oct 2022 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Linguistic information is encoded at varying timescales (subwords, phrases, etc.)


Evidence > Intuition: Transferability Estimation for Encoder Selection

1 code implementation20 Oct 2022 Elisa Bassignana, Max Müller-Eberstein, Mike Zhang, Barbara Plank

With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori - as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable.

Structured Prediction

CrossRE: A Cross-Domain Dataset for Relation Extraction

1 code implementation17 Oct 2022 Elisa Bassignana, Barbara Plank

Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups.

Relation Classification

An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper

no code implementations23 Sep 2022 Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, Narges Mahyar

We identify four key groups of challenges for evaluating visual text analytics approaches (data ambiguity, experimental design, user trust, and "big picture" concerns) and provide suggestions for research opportunities from an interdisciplinary perspective.

Experimental Design

Skill Extraction from Job Postings using Weak Supervision

1 code implementation16 Sep 2022 Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank

Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching.

Sort by Structure: Language Model Ranking as Dependency Probing

no code implementations NAACL 2022 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.

Language Modelling Structured Prediction

Experimental Standards for Deep Learning in Natural Language Processing Research

1 code implementation13 Apr 2022 Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well.

Probing for Labeled Dependency Trees

1 code implementation ACL 2022 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Probing has become an important tool for analyzing representations in Natural Language Processing (NLP).

Dependency Parsing Informativeness

Genre as Weak Supervision for Cross-lingual Dependency Parsing

1 code implementation EMNLP 2021 Max Müller-Eberstein, Rob van der Goot, Barbara Plank

Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection.

Dependency Parsing

Cartography Active Learning

2 code implementations Findings (EMNLP) 2021 Mike Zhang, Barbara Plank

We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling.

Active Learning text-classification +1

SemEval-2021 Task 12: Learning with Disagreements

no code implementations SEMEVAL 2021 Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, Massimo Poesio

Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision.

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

no code implementations EACL (AdaptNLP) 2021 Rob van der Goot, Ahmet Üstün, Barbara Plank

However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks.

Dependency Parsing Lemmatization +1

Longitudinal Citation Prediction using Temporal Graph Neural Networks

no code implementations10 Dec 2020 Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein

Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time.

Citation Prediction

Team DiSaster at SemEval-2020 Task 11: Combining BERT and Hand-crafted Features for Identifying Propaganda Techniques in News

no code implementations SEMEVAL 2020 Anders Kaas, Viktor Torp Thomsen, Barbara Plank

We present an ablation study which shows that even though BERT representations are very powerful also for this task, BERT still benefits from being combined with carefully designed task-specific features.

Neural Unsupervised Domain Adaptation in NLP---A Survey

1 code implementation COLING 2020 Alan Ramponi, Barbara Plank

We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention.

Out-of-Distribution Generalization Unsupervised Domain Adaptation

FT Speech: Danish Parliament Speech Corpus

no code implementations25 May 2020 Andreas Kirkedal, Marija Stepanović, Barbara Plank

A combination of FT Speech with in-domain language data provides comparable results to models trained specifically on Spr\r{a}kbanken, showing that FT Speech transfers well to this data set.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction

no code implementations LREC 2020 Alan Ramponi, Barbara Plank, Rosario Lombardo

Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature.

Domain Adaptation Edge Detection +1

At a Glance: The Impact of Gaze Aggregation Views on Syntactic Tagging

no code implementations WS 2019 Sigrid Klerke, Barbara Plank

Hence, caution is warranted when using gaze data as signal for NLP, as no single view is robust over tasks, modeling choice and gaze corpus.

Chunking Part-Of-Speech Tagging +1

The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging

no code implementations21 Nov 2018 Barbara Plank, Sigrid Klerke, Zeljko Agic

In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora.

Cross-Lingual POS Tagging Part-Of-Speech Tagging +1

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

1 code implementation EMNLP 2018 Barbara Plank, Željko Agić

We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages.

Part-Of-Speech Tagging TAG

When Simple n-gram Models Outperform Syntactic Approaches: Discriminating between Dutch and Flemish

no code implementations COLING 2018 Martin Kroon, Masha Medvedeva, Barbara Plank

In this paper we present the results of our participation in the Discriminating between Dutch and Flemish in Subtitles VarDial 2018 shared task.

Predicting Authorship and Author Traits from Keystroke Dynamics

no code implementations WS 2018 Barbara Plank

Written text transmits a good deal of nonverbal information related to the author{'}s identity and social factors, such as age, gender and personality.

Machine Translation

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

1 code implementation ACL 2018 Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.

Gender Prediction

Strong Baselines for Neural Semi-supervised Learning under Domain Shift

2 code implementations ACL 2018 Sebastian Ruder, Barbara Plank

In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training.

Domain Adaptation Multi-Task Learning +2

ALL-IN-1: Short Text Classification with One Model for All Languages

1 code implementation26 Oct 2017 Barbara Plank

We present ALL-IN-1, a simple model for multilingual text classification that does not require any parallel data.

General Classification Multilingual text classification +3

The Power of Character N-grams in Native Language Identification

no code implementations WS 2017 Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan van Noord, Barbara Plank, Martijn Wieling

In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017.

Native Language Identification Text Classification

Learning to select data for transfer learning with Bayesian Optimization

1 code implementation EMNLP 2017 Sebastian Ruder, Barbara Plank

Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks.

Bayesian Optimization Part-Of-Speech Tagging +2

Cross-lingual tagger evaluation without test data

no code implementations EACL 2017 {\v{Z}}eljko Agi{\'c}, Barbara Plank, Anders S{\o}gaard

We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data.


When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

no code implementations9 Nov 2016 Barbara Plank, Malvina Nissim

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task.


Keystroke dynamics as signal for shallow syntactic parsing

1 code implementation COLING 2016 Barbara Plank

Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing.

CCG Supertagging Chunking

Semantic Tagging with Deep Residual Networks

1 code implementation COLING 2016 Johannes Bjerva, Barbara Plank, Johan Bos

We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets).

Part-Of-Speech Tagging POS +1

What to do about non-standard (or non-canonical) language in NLP

no code implementations28 Aug 2016 Barbara Plank

The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language.

Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

3 code implementations ACL 2016 Barbara Plank, Anders Søgaard, Yoav Goldberg

Bidirectional long short-term memory (bi-LSTM) networks have recently proven successful for various NLP sequence modeling tasks, but little is known about their reliance to input representations, target languages, data set size, and label noise.

Part-Of-Speech Tagging POS

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations15 Jan 2016 Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.


When POS data sets don't add up: Combatting sample bias

no code implementations LREC 2014 Dirk Hovy, Barbara Plank, Anders S{\o}gaard

We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.


SenTube: A Corpus for Sentiment Analysis on YouTube Social Media

no code implementations LREC 2014 Olga Uryupina, Barbara Plank, Aliaksei Severyn, Agata Rotondi, Aless Moschitti, ro

In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity.

Document Classification Informativeness +3

Cannot find the paper you are looking for? You can Submit a new open access paper.