no code implementations • RDSM (COLING) 2020 • Yue Li, Carolina Scarton
Correctly classifying stances of replies can be significantly helpful for the automatic detection and classification of online rumours.
no code implementations • EAMT 2022 • Sebastian T. Vincent, Loïc Barrault, Carolina Scarton
We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario.
1 code implementation • CL (ACL) 2021 • Fernando Alva-Manchego, Carolina Scarton, Lucia Specia
Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new data set (and other existing data) to analyze the variation of the correlation between metrics’ scores and human judgments across three dimensions: the perceived simplicity level, the system type, and the set of references used for computation.
1 code implementation • 28 May 2025 • Iknoor Singh, Carolina Scarton, Kalina Bontcheva
The proliferation of online news and the increasing spread of misinformation necessitate robust methods for automatic data analysis.
no code implementations • 25 May 2025 • Yue Li, Jake Vasilakes, Zhixue Zhao, Carolina Scarton
We introduce SCRum-9, a multilingual dataset for Rumour Stance Classification, containing 7, 516 tweet-reply pairs from X. SCRum-9 goes beyond existing stance classification datasets by covering more languages (9), linking examples to more fact-checked claims (2. 1k), and including complex annotations from multiple annotators to account for intra- and inter-annotator variability.
no code implementations • 8 May 2025 • Fatima Haouari, Carolina Scarton, Nicolò Faggiani, Nikolaos Nikolaidis, Bonka Kotseva, Ibrahim Abu Farha, Jens Linge, Kalina Bontcheva
Misleading narratives play a crucial role in shaping public opinion during elections, as they can influence how voters perceive candidates and political parties.
no code implementations • 29 Jan 2025 • Jake Vasilakes, Carolina Scarton, Zhixue Zhao
Our results show that VLMs generally rely more on text than images for stance detection and this trend persists across languages.
no code implementations • 9 Jan 2025 • Tomas Goldsack, Carolina Scarton, Chenghua Lin
In this work, we explore the application of Large Language Models to zero-shot Lay Summarisation.
no code implementations • 19 Dec 2024 • João A. Leite, Olesya Razuvayevskaya, Carolina Scarton, Kalina Bontcheva
Disinformation, irrespective of domain or language, aims to deceive or manipulate public opinion, typically through employing advanced persuasion techniques.
1 code implementation • 4 Nov 2024 • wei he, Tiago Kramer Vieira, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio
Idiomatic expressions are an integral part of human languages, often used to express complex ideas in compressed or conventional ways (e. g. eager beaver as a keen and enthusiastic person).
no code implementations • 28 Oct 2024 • Ivan Srba, Olesya Razuvayevskaya, João A. Leite, Robert Moro, Ipek Baris Schlicht, Sara Tonelli, Francisco Moreno García, Santiago Barrio Lottmann, Denis Teyssou, Valentin Porcellini, Carolina Scarton, Kalina Bontcheva, Maria Bielikova
In the current era of social media and generative AI, an ability to automatically assess the credibility of online social media content is of tremendous importance.
no code implementations • 24 Oct 2024 • Yue Li, Zhixue Zhao, Carolina Scarton
In-context learning (ICL) performance is known to be sensitive to the prompt design, yet the impact of class label options in zero-shot classification has been largely overlooked.
no code implementations • 16 Aug 2024 • Tomas Goldsack, Carolina Scarton, Matthew Shardlow, Chenghua Lin
This paper presents the setup and results of the second edition of the BioLaySumm shared task on the Lay Summarisation of Biomedical Research Articles, hosted at the BioNLP Workshop at ACL 2024.
no code implementations • 27 Jun 2024 • Sebastian Vincent, Charlotte Prescott, Chris Bayliss, Chris Oakley, Carolina Scarton
Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work.
no code implementations • 21 Jun 2024 • wei he, Marco Idiart, Carolina Scarton, Aline Villavicencio
Accurately modeling idiomatic or non-compositional language has been a longstanding challenge in Natural Language Processing (NLP).
1 code implementation • 18 Jun 2024 • João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources.
no code implementations • 9 Jun 2024 • Zhihao Zhang, Tomas Goldsack, Carolina Scarton, Chenghua Lin
Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences.
no code implementations • 30 May 2024 • Jake Vasilakes, Zhixue Zhao, Ivan Vykopal, Michal Gregor, Martin Hyben, Carolina Scarton
We describe the ExU project proposal and summarise the results of a user requirements survey regarding the design of tools to support fact-checking.
no code implementations • 15 Jan 2024 • Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.
no code implementations • 9 Nov 2023 • Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song
In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks.
1 code implementation • 24 Oct 2023 • Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin
Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e. g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience.
1 code implementation • 21 Oct 2023 • Freddy Heppell, Kalina Bontcheva, Carolina Scarton
This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn. world) and WarOnFakes (waronfakes. com), which publish content in Arabic, Chinese, English, French, German, and Spanish.
no code implementations • 29 Sep 2023 • Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, Chenghua Lin
This paper presents the results of the shared task on Lay Summarisation of Biomedical Research Articles (BioLaySumm), hosted at the BioNLP Workshop at ACL 2023.
no code implementations • 14 Sep 2023 • João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton
This paper introduces Pastel (Prompted weAk Supervision wiTh crEdibility signaLs), a weakly supervised approach that leverages large language models (LLMs) to extract credibility signals from web content, and subsequently combines them to predict the veracity of content without relying on human supervision.
1 code implementation • 14 Aug 2023 • Olesya Razuvayevskaya, Ben Wu, Joao A. Leite, Freddy Heppell, Ivan Srba, Carolina Scarton, Kalina Bontcheva, Xingyi Song
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient.
Multilingual text classification
parameter-efficient fine-tuning
+2
no code implementations • 10 Aug 2023 • Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva
Finding previously debunked narratives involves identifying claims that have already undergone fact-checking.
1 code implementation • 31 Jul 2023 • João A. Leite, Carolina Scarton, Diego F. Silva
Online social media is rife with offensive and hateful comments, prompting the need for their automatic detection given the sheer amount of posts created every second.
Ranked #1 on
Hate Speech Detection
on OLID
(using extra training data)
1 code implementation • 25 May 2023 • Sebastian Vincent, Robert Flynn, Carolina Scarton
This work introduces MTCue, a novel neural machine translation (NMT) framework that interprets all context (including discrete variables) as text.
no code implementations • 23 May 2023 • Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song
Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific prompts.
no code implementations • 10 Apr 2023 • Yida Mu, Ye Jiang, Freddy Heppell, Iknoor Singh, Carolina Scarton, Kalina Bontcheva, Xingyi Song
This motivated us to carry out a comparative study of the characteristics of COVID-19 misinformation versus those of accurate COVID-19 information through a large-scale computational analysis of over 242 million tweets.
1 code implementation • 29 Mar 2023 • Sebastian Vincent, Alice Dowek, Rowanne Sumner, Charlotte Blundell, Emily Preston, Chris Bayliss, Chris Oakley, Carolina Scarton
Our results suggest that the degree to which professional translations in our domain are context-specific can be preserved to a better extent by a contextual machine translation model than a non-contextual model, which is also reflected in the contextual model's superior reference-based scores.
no code implementations • 22 Mar 2023 • Yue Li, Carolina Scarton
Considering a conversation thread, rumour stance classification aims to identify the opinion (e. g. agree or disagree) of replies towards a target (rumour story).
1 code implementation • 16 Mar 2023 • Ben Wu, Olesya Razuvayevskaya, Freddy Heppell, João A. Leite, Carolina Scarton, Kalina Bontcheva, Xingyi Song
For Subtask 2 (Framing), we achieved first place in 3 languages, and the best average rank across all the languages, by using two separate ensembles: a monolingual RoBERTa-MUPPETLARGE and an ensemble of XLM-RoBERTaLARGE with adapters and task adaptive pretraining.
1 code implementation • 17 Jan 2023 • Yida Mu, Mali Jin, Charlie Grimshaw, Carolina Scarton, Kalina Bontcheva, Xingyi Song
Annotated data is also necessary for training data-driven models for more nuanced analysis of attitudes towards vaccination.
1 code implementation • 18 Oct 2022 • Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton
Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts.
Ranked #1 on
Lay Summarization
on PLOS
no code implementations • 18 Jul 2022 • Yue Li, Carolina Scarton, Xingyi Song, Kalina Bontcheva
This paper addresses the need for monitoring and analysing vaccine narratives online by introducing a novel vaccine narrative classification task, which categorises COVID-19 vaccine claims into one of seven categories.
1 code implementation • SemEval (NAACL) 2022 • Iknoor Singh, Yue Li, Melissa Thong, Carolina Scarton
This paper describes the second-placed system on the leaderboard of SemEval-2022 Task 8: Multilingual News Article Similarity.
no code implementations • LREC (MWE) 2022 • Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.
no code implementations • IWSLT (ACL) 2022 • Sebastian T. Vincent, Loïc Barrault, Carolina Scarton
This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign.
no code implementations • 10 May 2022 • Sebastian T. Vincent, Loïc Barrault, Carolina Scarton
We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario.
1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.
1 code implementation • 8 Apr 2022 • Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.
1 code implementation • Findings (EMNLP) 2021 • Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio
Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.
1 code implementation • ACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio
This paper presents the Noun Compound Type and Token Idiomaticity (NCTTI) dataset, with human annotations for 280 noun compounds in English and 180 in Portuguese at both type and token level.
no code implementations • 22 Jun 2021 • Ye Jiang, Xingyi Song, Carolina Scarton, Ahmet Aker, Kalina Bontcheva
In this paper, we introduce a fine-grained annotated misinformation tweets dataset including social behaviours annotation (e. g. comment or question to the misinformation).
1 code implementation • EACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio
Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language.
1 code implementation • 8 Jan 2021 • Iknoor Singh, Carolina Scarton, Kalina Bontcheva
The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Carolina Scarton, Diego F. Silva, Kalina Bontcheva
This paper specifically questions the evaluation metrics used in these shared tasks.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • João A. Leite, Diego F. Silva, Kalina Bontcheva, Carolina Scarton
Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media.
Ranked #1 on
Hate Speech Detection
on ToLD-Br
1 code implementation • ACL 2020 • Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia
Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.
no code implementations • LREC 2020 • Roney Santos, Gabriela Pedro, Sidney Leal, Oto Vale, Thiago Pardo, Kalina Bontcheva, Carolina Scarton
The proliferation of fake news is a current issue that influences a number of important areas of society, such as politics, economy and health.
no code implementations • CL 2020 • Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia
Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand.
1 code implementation • EMNLP (IWSLT) 2019 • Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis, Lucia Specia
To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort.
1 code implementation • IJCNLP 2019 • Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia
We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems.
no code implementations • WS 2019 • Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia
Current approaches to Text Simplification focus on simplifying sentences individually.
no code implementations • WS 2018 • Chiraag Lala, Pranava Swaroop Madhyastha, Carolina Scarton, Lucia Specia
For task 1b, we explore three approaches: (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.
no code implementations • WS 2018 • Julia Ive, Carolina Scarton, Fr{\'e}d{\'e}ric Blain, Lucia Specia
In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task.
no code implementations • WS 2018 • Mikel L. Forcada, Carolina Scarton, Lucia Specia, Barry Haddow, Alexandra Birch
A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language.
no code implementations • ACL 2018 • Carolina Scarton, Lucia Specia
Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text.
no code implementations • IJCNLP 2017 • Carolina Scarton, Alessio Palmero Aprosio, Sara Tonelli, Tamara Mart{\'\i}n Wanton, Lucia Specia
Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications).
1 code implementation • IJCNLP 2017 • Fern Alva-Manchego, o, Joachim Bingel, Gustavo Paetzold, Carolina Scarton, Lucia Specia
Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data.
Ranked #8 on
Text Simplification
on PWKP / WikiSmall
(SARI metric)
no code implementations • EACL 2017 • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton
Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.
no code implementations • COLING 2016 • Carolina Scarton, Gustavo Paetzold, Lucia Specia
The goal of QE is to estimate the quality of language output applications without the need of human references.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
1 code implementation • LREC 2016 • Carolina Scarton, Lucia Specia
Effectively assessing Natural Language Processing output tasks is a challenge for research in the area.
no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi