no code implementations • WS 2015 • Leon Derczynski, Isabelle Augenstein, Kalina Bontcheva
This paper describes a pilot NER system for Twitter, comprising the USFD system entry to the W-NUT 2015 NER shared task.
no code implementations • LREC 2016 • Piroska Lendvai, Isabelle Augenstein, Kalina Bontcheva, Thierry Declerck
Entailment recognition approaches are useful for application domains such as information extraction, question answering or summarisation, for which evidence from multiple sentences needs to be combined.
1 code implementation • EMNLP 2016 • Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva
Stance detection is the task of classifying the attitude expressed in a text towards a target such as Hillary Clinton to be "positive", negative" or "neutral".
no code implementations • EMNLP 2016 • Georgios P. Spithourakis, Isabelle Augenstein, Sebastian Riedel
Semantic error detection and correction is an important task for applications such as fact checking, speech-to-text or grammatical error correction.
7 code implementations • WS 2016 • Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel
Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings.
no code implementations • 11 Jan 2017 • Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva
Unseen NEs, in particular, play an important role, which have a higher incidence in diverse genres such as social media than in more regular genres such as newswire.
no code implementations • ACL 2017 • Isabelle Augenstein, Anders Søgaard
Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types.
1 code implementation • SEMEVAL 2017 • Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials.
1 code implementation • SEMEVAL 2017 • Elena Kochkina, Maria Liakata, Isabelle Augenstein
This paper describes team Turing's submission to SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A).
Ranked #1 on Stance Detection on RumourEval
2 code implementations • 23 May 2017 • Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses.
2 code implementations • CONLL 2017 • Ed Collins, Isabelle Augenstein, Sebastian Riedel
Automatic summarisation is a popular approach to reduce a document to its main arguments.
9 code implementations • 11 Jul 2017 • Benjamin Riedel, Isabelle Augenstein, Georgios P. Spithourakis, Sebastian Riedel
Identifying public misinformation is a complicated and challenging task.
Ranked #5 on Fake News Detection on FNC-1
no code implementations • WS 2018 • Johannes Bjerva, Isabelle Augenstein
Although linguistic typology has a long history, computational approaches have only recently gained popularity.
no code implementations • 6 Dec 2017 • Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein
We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers.
no code implementations • NAACL 2018 • Johannes Bjerva, Isabelle Augenstein
A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS).
1 code implementation • NAACL 2018 • Isabelle Augenstein, Sebastian Ruder, Anders Søgaard
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets.
no code implementations • SEMEVAL 2018 • Thomas Nyegaard-Signori, Casper Veistrup Helms, Johannes Bjerva, Isabelle Augenstein
We take a multi-task learning approach to the shared Task 1 at SemEval-2018.
2 code implementations • 20 Jun 2018 • Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel
For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions.
no code implementations • WS 2018 • Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders S{\o}gaard
Neural part-of-speech (POS) taggers are known to not perform well with little training data.
1 code implementation • ACL 2018 • Dirk Weissenborn, Pasquale Minervini, Isabelle Augenstein, Johannes Welbl, Tim Rockt{\"a}schel, Matko Bo{\v{s}}njak, Jeff Mitchell, Thomas Demeester, Tim Dettmers, Pontus Stenetorp, Sebastian Riedel
For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions.
no code implementations • EMNLP 2018 • Ana V. González-Garduño, Isabelle Augenstein, Anders Søgaard
The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks -- a task that amounts to question relevancy ranking -- involve complex pipelines and manual feature engineering.
1 code implementation • EMNLP 2018 • Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard
We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies.
no code implementations • WS 2018 • Anders Søgaard, Miryam de Lhoneux, Isabelle Augenstein
Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal.
no code implementations • CONLL 2018 • Yova Kementchedjhieva, Johannes Bjerva, Isabelle Augenstein
This paper documents the Team Copenhagen system which placed first in the CoNLL--SIGMORPHON 2018 shared task on universal morphological reinflection, Task 2 with an overall accuracy of 49. 87.
no code implementations • CL 2019 • Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein
If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations.
1 code implementation • NAACL 2019 • Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein
In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features.
1 code implementation • NAACL 2019 • Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein
When assigning quantitative labels to a dataset, different methodologies may rely on different scales.
no code implementations • NAACL 2019 • Mareike Hartmann, Tallulah Jansen, Isabelle Augenstein, Anders Søgaard
In online discussion fora, speakers often make arguments for or against something, say birth control, by highlighting certain aspects of the topic.
no code implementations • ACL 2019 • Alexander Hoyle, Wolf-Sonkin, Hanna Wallach, Isabelle Augenstein, Ryan Cotterell
Studying the ways in which language is gendered has long been an area of interest in sociolinguistics.
no code implementations • ACL 2019 • Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein
The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages with object-verb word ordering tend to have post-positions.
1 code implementation • WS 2019 • Mostafa Abdou, Cezar Sas, Rahul Aralikatte, Isabelle Augenstein, Anders Søgaard
Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in different languages.
no code implementations • WS 2019 • Johannes Bjerva, Katharina Kann, Isabelle Augenstein
Multi-task learning and self-training are two common ways to improve a machine learning model's performance in settings with limited training data.
no code implementations • IJCNLP 2019 • Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen
We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification.
1 code implementation • 8 Sep 2019 • Johannes Bjerva, Wouter Kouw, Isabelle Augenstein
In particular, language evolution causes data drift between time-steps in sequential decision-making tasks.
1 code implementation • 16 Sep 2019 • Joachim Bingel, Victor Petrén Bach Hansen, Ana Valeria Gonzalez, Paweł Budzianowski, Isabelle Augenstein, Anders Søgaard
Task oriented dialogue systems rely heavily on specialized dialogue state tracking (DST) modules for dynamically predicting user intent throughout the conversation.
no code implementations • 30 Sep 2019 • Ana Valeria Gonzalez, Isabelle Augenstein, Anders Søgaard
Most research on dialogue has focused either on dialogue generation for openended chit chat or on state tracking for goal-directed dialogue.
1 code implementation • WS 2019 • Mareike Hartmann, Yevgeniy Golovchenko, Isabelle Augenstein
In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash.
no code implementations • 20 Nov 2019 • Luna De Bruyne, Pepa Atanasova, Isabelle Augenstein
Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection.
2 code implementations • 2 Dec 2019 • Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein
While state-of-the-art NLP explainability (XAI) methods focus on explaining per-sample decisions in supervised end or probing tasks, this is insufficient to explain and quantify model knowledge transfer during (un-)supervised training.
Explainable Artificial Intelligence (XAI) Model Compression +1
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Dustin Wright, Isabelle Augenstein
In applying this, we out-perform the state of the art in two of the three tasks studied for claim check-worthiness detection in English.
1 code implementation • EMNLP 2020 • Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, Isabelle Augenstein
We show that this challenging setup can be approached using meta-learning, where, in addition to training a source language model, another model learns to select which training instances are the most beneficial to the first.
no code implementations • ACL 2020 • Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims.
1 code implementation • EMNLP 2020 • Johannes Bjerva, Nikita Bhutani, Behzad Golshan, Wang-Chiew Tan, Isabelle Augenstein
We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance.
1 code implementation • ACL 2020 • Pranav A, Isabelle Augenstein
Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2021 • Wei Zhao, Steffen Eger, Johannes Bjerva, Isabelle Augenstein
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
1 code implementation • 10 Sep 2020 • Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein
We: 1) construct a small annotated dataset, PolitiHop, of evidence sentences for claim verification; 2) compare it to existing multi-hop datasets; and 3) study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop.
no code implementations • 10 Sep 2020 • Liesbeth Allein, Isabelle Augenstein, Marie-Francine Moens
Truth can vary over time.
1 code implementation • EMNLP 2020 • Dustin Wright, Isabelle Augenstein
Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen.
1 code implementation • EMNLP 2020 • Pepa Atanasova, Dustin Wright, Isabelle Augenstein
However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in.
1 code implementation • EMNLP 2020 • Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity.
no code implementations • 28 Sep 2020 • Nils Rethmeier, Isabelle Augenstein
We thus approach pretraining from a miniaturisation perspective, such as not to require massive external data sources and models, or learned translations from continuous input embeddings to discrete labels.
no code implementations • 2 Oct 2020 • Nils Rethmeier, Isabelle Augenstein
For natural language processing `text-to-text' tasks, the prevailing approaches heavily rely on pretraining large self-supervised models on increasingly larger `task-external' data.
no code implementations • EMNLP (BlackboxNLP) 2020 • Lukas Muttenthaler, Isabelle Augenstein, Johannes Bjerva
We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer span is correct.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Anna Rogers, Isabelle Augenstein
Peer review is our best tool for judging the quality of conference submissions, but it is becoming increasingly spurious.
no code implementations • EMNLP (SIGTYP) 2020 • Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein
Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages.
no code implementations • NAACL (DistCurate) 2022 • Andrea Lekkas, Peter Schneider-Kamp, Isabelle Augenstein
The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy).
no code implementations • 10 Dec 2020 • Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein
Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time.
no code implementations • 28 Jan 2021 • Zeerak Waseem, Smarika Lulz, Joachim Bingel, Isabelle Augenstein
In this paper, we contextualise this discourse of bias in the ML community against the subjective choices in the development process.
no code implementations • EACL 2021 • Johannes Bjerva, Isabelle Augenstein
Our hypothesis is that a model trained in a cross-lingual setting will pick up on typological cues from the input data, thus overshadowing the utility of explicitly using such features.
no code implementations • 23 Feb 2021 • Thamar Solorio, Mahsa Shafaei, Christos Smailis, Isabelle Augenstein, Margaret Mitchell, Ingrid Stapf, Ioannis Kakadiaris
This white paper summarizes the authors' structured brainstorming regarding ethical considerations for creating an extensive repository of online content labeled with tags that describe potentially questionable content for young viewers.
no code implementations • 25 Feb 2021 • Nils Rethmeier, Isabelle Augenstein
Contrastive self-supervised training objectives enabled recent successes in image representation pretraining by learning to contrast input-input pairs of augmented images as either similar or dissimilar.
no code implementations • Findings (NAACL) 2022 • Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein
Understanding attitudes expressed in texts, also known as stance detection, plays an important role in systems for detecting false information online, be it misinformation (unintentionally false) or disinformation (intentionally false information).
no code implementations • 27 Feb 2021 • Arnav Arora, Preslav Nakov, Momchil Hardalov, Sheikh Muhammad Sarwar, Vibha Nayak, Yoan Dinkov, Dimitrina Zlatkova, Kyle Dent, Ameya Bhatawdekar, Guillaume Bouchard, Isabelle Augenstein
The proliferation of harmful content on online platforms is a major societal problem, which comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other.
no code implementations • 3 Mar 2021 • Lucas Chaves Lima, Dustin Brandon Wright, Isabelle Augenstein, Maria Maistro
Our approach consists of 3 steps: (1) we create an initial run with BM25 and RM3; (2) we estimate credibility and misinformation scores for the documents in the initial run; (3) we merge the relevance, credibility and misinformation scores to re-rank documents in the initial run.
no code implementations • 31 Mar 2021 • Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov
The framework is based on a nearest-neighbour architecture.
1 code implementation • 15 Apr 2021 • Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan Cotterell, Isabelle Augenstein
Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language.
1 code implementation • EMNLP 2021 • Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein
In this paper, we perform an in-depth analysis of 16 stance detection datasets, and we explore the possibility for cross-domain learning from them.
1 code implementation • Findings (ACL) 2021 • Dustin Wright, Isabelle Augenstein
Scientific document understanding is challenging as the data is highly domain specific and diverse.
no code implementations • NAACL (sdp) 2021 • Isabelle Augenstein
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct.
no code implementations • ACL 2021 • Clara Meister, Stefan Lazov, Isabelle Augenstein, Ryan Cotterell
Sparse attention has been claimed to increase model interpretability under the assumption that it highlights influential inputs.
no code implementations • 27 Jul 2021 • Anna Rogers, Matt Gardner, Isabelle Augenstein
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress.
no code implementations • 23 Aug 2021 • Isabelle Augenstein
This development has spurred research in the area of automatic fact checking, from approaches to detect check-worthy claims and determining the stance of tweets towards claims, to methods to determine the veracity of claims given evidence documents.
1 code implementation • EMNLP 2021 • Dustin Wright, Isabelle Augenstein
Given this, we present a formalization of and study into the problem of exaggeration detection in science communication.
no code implementations • 8 Sep 2021 • Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to optimising an explanation's Faithfulness to a given model.
1 code implementation • 13 Sep 2021 • Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein
Most research in stance detection, however, has been limited to working with a single language and on a few limited targets, with little work on cross-lingual stance detection.
no code implementations • 15 Sep 2021 • Sagnik Ray Choudhury, Nikita Bhutani, Isabelle Augenstein
We find that EP test results do not change significantly when the fine-tuned model performs well or in adversarial situations where the model is forced to learn wrong correlations.
no code implementations • 13 Dec 2021 • Shailza Jolly, Pepa Atanasova, Isabelle Augenstein
In addition, we show the applicability of our approach in a completely unsupervised setting.
1 code implementation • 22 Dec 2021 • Sara Marjanovic, Karolina Stańczak, Isabelle Augenstein
Rather than overt hostile or benevolent sexism, the results of the nominal and lexical analyses suggest this interest is not as professional or respectful as that expressed about male politicians.
no code implementations • 28 Dec 2021 • Karolina Stanczak, Isabelle Augenstein
3) Despite a myriad of papers on gender bias in NLP methods, we find that most of the newly developed algorithms do not test their models for bias and disregard possible ethical considerations of their work.
2 code implementations • 20 Jan 2022 • Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein
The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic information.
1 code implementation • 14 Feb 2022 • Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm
Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics.
Ranked #1 on Document Classification on SciDocs (MeSH)
1 code implementation • ACL 2022 • Dustin Wright, David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Isabelle Augenstein, Lucy Lu Wang
To address this challenge, we propose scientific claim generation, the task of generating one or more atomic and verifiable claims from scientific sentences, and demonstrate its usefulness in zero-shot fact checking for biomedical claims.
1 code implementation • 25 Mar 2022 • Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein
In this paper, we introduce probes to study which values across cultures are embedded in these models, and whether they align with existing theories and cross-cultural value surveys.
no code implementations • 5 Apr 2022 • Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions.
1 code implementation • NAACL 2022 • Karolina Stańczak, Edoardo Ponti, Lucas Torroba Hennigen, Ryan Cotterell, Isabelle Augenstein
The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages even in absence of any explicit supervision.
no code implementations • NAACL 2022 • Indira Sen, Mattia Samory, Claudia Wagner, Isabelle Augenstein
Especially, construct-driven CAD -- perturbations of core features -- may induce models to ignore the context in which core features are used.
no code implementations • 15 Sep 2022 • Sagnik Ray Choudhury, Anna Rogers, Isabelle Augenstein
Two of the most fundamental challenges in Natural Language Understanding (NLU) at present are: (a) how to establish whether deep learning-based models score highly on NLU benchmarks for the 'right' reasons; and (b) to understand what those reasons would even be.
no code implementations • 24 Oct 2022 • Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein
Whether the media faithfully communicate scientific information has long been a core issue to the science community.
no code implementations • 25 Oct 2022 • Andreas Nugaard Holm, Dustin Wright, Isabelle Augenstein
A cheaper alternative is to simply use the softmax based on a single forward pass without dropout to estimate model uncertainty.
no code implementations • 19 Dec 2022 • Dustin Wright, Isabelle Augenstein
Selecting an effective training signal for tasks in natural language processing is difficult: expert annotations are expensive, and crowd-sourced annotations may not be reliable.
1 code implementation • 5 Feb 2023 • Klim Zaporojets, Lucie-Aimee Kaffee, Johannes Deleu, Thomas Demeester, Chris Develder, Isabelle Augenstein
For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities' descriptions.
no code implementations • 12 Apr 2023 • Sandra Martinková, Karolina Stańczak, Isabelle Augenstein
Perhaps surprisingly, Czech, Slovak, and Polish language models produce more hurtful completions with men as subjects, which, upon inspection, we find is due to completions being related to violence, death, and sickness.
1 code implementation • 17 Apr 2023 • Lucie-Aimée Kaffee, Arnav Arora, Zeerak Talat, Isabelle Augenstein
Dual use, the intentional, harmful reuse of technology and scientific artefacts, is a problem yet to be well-defined within the context of Natural Language Processing (NLP).
1 code implementation • 18 May 2023 • Nadav Borenstein, Natalia da Silva Perez, Isabelle Augenstein
We find that: 1) even with scarce annotated data, it is possible to achieve surprisingly good results by formulating the problem as an extractive QA task and leveraging existing datasets and models for modern languages; and 2) cross-lingual low-resource learning for historical languages is highly challenging, and machine translation of the historical datasets to the considered target languages is, in practice, often the best-performing solution.
1 code implementation • 21 May 2023 • Nadav Borenstein, Karolina Stańczak, Thea Rolskov, Natália da Silva Perez, Natacha Klein Käfer, Isabelle Augenstein
We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset.
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • 29 May 2023 • Pepa Atanasova, Oana-Maria Camburu, Christina Lioma, Thomas Lukasiewicz, Jakob Grue Simonsen, Isabelle Augenstein
Explanations of neural models aim to reveal a model's decision-making process for its predictions.
2 code implementations • 1 Jun 2023 • Erik Arakelyan, Arnav Arora, Isabelle Augenstein
The results show that our method outperforms the state-of-the-art with an average of $3. 5$ F1 points increase in-domain, and is more generalizable with an averaged increase of $10. 2$ F1 on out-of-domain evaluation while using $\leq10\%$ of the training data.
Ranked #1 on Stance Detection on mtsd
no code implementations • 8 Oct 2023 • Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni
The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention.
1 code implementation • 9 Oct 2023 • Lucie-Aimée Kaffee, Arnav Arora, Isabelle Augenstein
The moderation of content on online platforms is usually non-transparent.
1 code implementation • 20 Oct 2023 • Sagnik Ray Choudhury, Pepa Atanasova, Isabelle Augenstein
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI).
1 code implementation • 22 Oct 2023 • Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein
We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
1 code implementation • 2 Nov 2023 • Indira Sen, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil van der Aalst, Claudia Wagner
CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features.
no code implementations • 15 Nov 2023 • Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti, Isabelle Augenstein
While the impact of these biases has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, offering a constrained view of the nature of societal biases within language models.
1 code implementation • 15 Nov 2023 • Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs.
no code implementations • 30 Nov 2023 • Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell
However, when we control for the meaning of the noun, we find that grammatical gender has a near-zero effect on adjective choice, thereby calling the neo-Whorfian hypothesis into question.
no code implementations • 25 Jan 2024 • Erik Arakelyan, Zhaoqi Liu, Isabelle Augenstein
We systematically study the effects of the phenomenon across NLI models for $\textbf{in-}$ and $\textbf{out-of-}$ domain settings.
no code implementations • 19 Feb 2024 • Amelie Wührl, Dustin Wright, Roman Klinger, Isabelle Augenstein
Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in scientific institutions.
no code implementations • 20 Feb 2024 • Sara Vera Marjanović, Isabelle Augenstein, Christina Lioma
In this large-scale empirical study, we insert different levels of noise perturbations and measure the effect on the output of pre-trained language models and different uncertainty metrics.
no code implementations • COLING 2022 • Sagnik Ray Choudhury, Nikita Bhutani, Isabelle Augenstein
We find that EP test results do not change significantly when the fine-tuned model performs well or in adversarial situations where the model is forced to learn wrong correlations.
no code implementations • COLING 2022 • Sagnik Ray Choudhury, Anna Rogers, Isabelle Augenstein
Two of the most fundamental issues in Natural Language Understanding (NLU) at present are: (a) how it can established whether deep learning-based models score highly on NLU benchmarks for the ”right” reasons; and (b) what those reasons would even be.
no code implementations • EAMT 2022 • Anabela Barreiro, José GC de Souza, Albert Gatt, Mehul Bhatt, Elena Lloret, Aykut Erdem, Dimitra Gkatzia, Helena Moniz, Irene Russo, Fabio Kepler, Iacer Calixto, Marcin Paprzycki, François Portet, Isabelle Augenstein, Mirela Alhasani
This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation.