Search Results for author: Carolina Scarton

Found 62 papers, 22 papers with code

Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation

no code implementations • EAMT 2022 • Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario.

Attribute Machine Translation +2

Paper
Add Code

Revisiting Rumour Stance Classification: Dealing with Imbalanced Data

no code implementations • RDSM (COLING) 2020 • Yue Li, Carolina Scarton

Correctly classifying stances of replies can be significantly helpful for the automatic detection and classification of online rumours.

Classification Rumour Detection +1

Paper
Add Code

The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification

1 code implementation • CL (ACL) 2021 • Fernando Alva-Manchego, Carolina Scarton, Lucia Specia

Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new data set (and other existing data) to analyze the variation of the correlation between metrics’ scores and human judgments across three dimensions: the perceived simplicity level, the system type, and the set of references used for computation.

Sentence Text Simplification

Paper
Code

Word Boundary Information Isn't Useful for Encoder Language Models

no code implementations • 15 Jan 2024 • Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.

NER Sentence

Paper
Add Code

Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

no code implementations • 9 Nov 2023 • Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song

In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks.

Paper
Add Code

Enhancing Biomedical Lay Summarisation with External Knowledge Graphs

1 code implementation • 24 Oct 2023 • Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin

Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e. g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience.

Knowledge Graphs

Paper
Code

Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

1 code implementation • 21 Oct 2023 • Freddy Heppell, Kalina Bontcheva, Carolina Scarton

This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn. world) and WarOnFakes (waronfakes. com), which publish content in Arabic, Chinese, English, French, German, and Spanish.

Paper
Code

Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles

no code implementations • 29 Sep 2023 • Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, Chenghua Lin

This paper presents the results of the shared task on Lay Summarisation of Biomedical Research Articles (BioLaySumm), hosted at the BioNLP Workshop at ACL 2023.

Lay Summarization

Paper
Add Code

Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision

no code implementations • 14 Sep 2023 • João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content.

Misinformation

Paper
Add Code

Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification

1 code implementation • 14 Aug 2023 • Olesya Razuvayevskaya, Ben Wu, Joao A. Leite, Freddy Heppell, Ivan Srba, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient.

Multilingual text classification text-classification +1

Paper
Code

Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning

no code implementations • 10 Aug 2023 • Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva

The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked.

Fact Checking Misinformation +3

Paper
Add Code

Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks

1 code implementation • 31 Jul 2023 • João A. Leite, Carolina Scarton, Diego F. Silva

Online social media is rife with offensive and hateful comments, prompting the need for their automatic detection given the sheer amount of posts created every second.

Ranked #1 on Hate Speech Detection on OLID (using extra training data)

Data Augmentation Hate Speech Detection

Paper
Code

MTCue: Learning Zero-Shot Control of Extra-Textual Attributes by Leveraging Unstructured Context in Neural Machine Translation

1 code implementation • 25 May 2023 • Sebastian Vincent, Robert Flynn, Carolina Scarton

This work introduces MTCue, a novel neural machine translation (NMT) framework that interprets all context (including discrete variables) as text.

Machine Translation NMT +1

Paper
Code

Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

no code implementations • 23 May 2023 • Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific prompts.

Zero-Shot Learning

Paper
Add Code

A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

no code implementations • 10 Apr 2023 • Yida Mu, Ye Jiang, Freddy Heppell, Iknoor Singh, Carolina Scarton, Kalina Bontcheva, Xingyi Song

This motivated us to carry out a comparative study of the characteristics of COVID-19 misinformation versus those of accurate COVID-19 information through a large-scale computational analysis of over 242 million tweets.

Misinformation

Paper
Add Code

Reference-less Analysis of Context Specificity in Translation with Personalised Language Models

no code implementations • 29 Mar 2023 • Sebastian Vincent, Alice Dowek, Rowanne Sumner, Charlotte Blundell, Emily Preston, Chris Bayliss, Chris Oakley, Carolina Scarton

Our results suggest that the degree to which professional translations in our domain are context-specific can be preserved to a better extent by a contextual machine translation model than a non-contextual model, which is also reflected in the contextual model's superior reference-based scores.

Language Modelling Machine Translation +2

Paper
Add Code

Can We Identify Stance Without Target Arguments? A Study for Rumour Stance Classification

no code implementations • 22 Mar 2023 • Yue Li, Carolina Scarton

Considering a conversation thread, rumour stance classification aims to identify the opinion (e. g. agree or disagree) of replies towards a target (rumour story).

Classification Sentiment Analysis +1

Paper
Add Code

SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification

1 code implementation • 16 Mar 2023 • Ben Wu, Olesya Razuvayevskaya, Freddy Heppell, João A. Leite, Carolina Scarton, Kalina Bontcheva, Xingyi Song

For Subtask 2 (Framing), we achieved first place in 3 languages, and the best average rank across all the languages, by using two separate ensembles: a monolingual RoBERTa-MUPPETLARGE and an ensemble of XLM-RoBERTaLARGE with adapters and task adaptive pretraining.

Paper
Code

VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter

1 code implementation • 17 Jan 2023 • Yida Mu, Mali Jin, Charlie Grimshaw, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Annotated data is also necessary for training data-driven models for more nuanced analysis of attitudes towards vaccination.

Language Modelling

Paper
Code

Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

1 code implementation • 18 Oct 2022 • Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton

Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts.

Ranked #1 on Lay Summarization on PLOS

Lay Summarization

Paper
Code

Classifying COVID-19 vaccine narratives

no code implementations • 18 Jul 2022 • Yue Li, Carolina Scarton, Xingyi Song, Kalina Bontcheva

This paper addresses the need for monitoring and analysing vaccine narratives online by introducing a novel vaccine narrative classification task, which categorises COVID-19 vaccine claims into one of seven categories.

Data Augmentation

Paper
Add Code

GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese Transformer for Multilingual News Article Similarity

1 code implementation • SemEval (NAACL) 2022 • Iknoor Singh, Yue Li, Melissa Thong, Carolina Scarton

This paper describes the second-placed system on the leaderboard of SemEval-2022 Task 8: Multilingual News Article Similarity.

Paper
Code

Sample Efficient Approaches for Idiomaticity Detection

no code implementations • LREC (MWE) 2022 • Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.

Paper
Add Code

Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

no code implementations • IWSLT (ACL) 2022 • Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign.

Domain Adaptation NMT +3

Paper
Add Code

Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation

no code implementations • 10 May 2022 • Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

Attribute Machine Translation +2

Paper
Add Code

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.

Binary Classification Sentence +4

Paper
Code

Improving Tokenisation by Alternative Treatment of Spaces

1 code implementation • 8 Apr 2022 • Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.

Natural Language Understanding

Paper
Code

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

1 code implementation • Findings (EMNLP) 2021 • Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio

Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.

Language Modelling

Paper
Code

Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels

1 code implementation • ACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the Noun Compound Type and Token Idiomaticity (NCTTI) dataset, with human annotations for 280 noun compounds in English and 180 in Portuguese at both type and token level.

Vocal Bursts Type Prediction

Paper
Code

Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic

no code implementations • 22 Jun 2021 • Ye Jiang, Xingyi Song, Carolina Scarton, Ahmet Aker, Kalina Bontcheva

In this paper, we introduce a fine-grained annotated misinformation tweets dataset including social behaviours annotation (e. g. comment or question to the misinformation).

Misinformation

Paper
Add Code

Probing for idiomaticity in vector space models

1 code implementation • EACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language.

Paper
Code

Multistage BiCross encoder for multilingual access to COVID-19 health information

1 code implementation • 8 Jan 2021 • Iknoor Singh, Carolina Scarton, Kalina Bontcheva

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online.

Retrieval

Paper
Code

Measuring What Counts: The case of Rumour Stance Classification

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Carolina Scarton, Diego F. Silva, Kalina Bontcheva

This paper specifically questions the evaluation metrics used in these shared tasks.

Classification General Classification +3

Paper
Add Code

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • João A. Leite, Diego F. Silva, Kalina Bontcheva, Carolina Scarton

Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media.

Ranked #1 on Hate Speech Detection on ToLD-Br

Hate Speech Detection Multi-Label Classification +1

Paper
Code

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

1 code implementation • ACL 2020 • Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia

Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.

Sentence

Paper
Code

Measuring the Impact of Readability Features in Fake News Detection

no code implementations • LREC 2020 • Roney Santos, Gabriela Pedro, Sidney Leal, Oto Vale, Thiago Pardo, Kalina Bontcheva, Carolina Scarton

The proliferation of fake news is a current issue that influences a number of important areas of society, such as politics, economy and health.

Classification Fake News Detection +1

Paper
Add Code

Data-Driven Sentence Simplification: Survey and Benchmark

no code implementations • CL 2020 • Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia

Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand.

Sentence

Paper
Add Code

Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality

1 code implementation • EMNLP (IWSLT) 2019 • Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis, Lucia Specia

To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort.

Machine Translation Translation

Paper
Code

EASSE: Easier Automatic Sentence Simplification Evaluation

1 code implementation • IJCNLP 2019 • Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia

We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems.

Sentence

153

Paper
Code

Cross-Sentence Transformations in Text Simplification

no code implementations • WS 2019 • Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia

Current approaches to Text Simplification focus on simplifying sentences individually.

Sentence Text Simplification

Paper
Add Code

Sheffield Submissions for WMT18 Multimodal Translation Shared Task

no code implementations • WS 2018 • Chiraag Lala, Pranava Swaroop Madhyastha, Carolina Scarton, Lucia Specia

For task 1b, we explore three approaches: (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.

Data Augmentation Multimodal Machine Translation +4

Paper
Add Code

Sheffield Submissions for the WMT18 Quality Estimation Shared Task

no code implementations • WS 2018 • Julia Ive, Carolina Scarton, Fr{\'e}d{\'e}ric Blain, Lucia Specia

In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task.

Machine Translation

Paper
Add Code

Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting

no code implementations • WS 2018 • Mikel L. Forcada, Carolina Scarton, Lucia Specia, Barry Haddow, Alexandra Birch

A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language.

Machine Translation Reading Comprehension +2

Paper
Add Code

Learning Simplifications for Specific Target Audiences

no code implementations • ACL 2018 • Carolina Scarton, Lucia Specia

Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text.

Lexical Simplification Machine Translation +4

Paper
Add Code

Text Simplification from Professionally Produced Corpora

no code implementations • LREC 2018 • Carolina Scarton, Gustavo Paetzold, Lucia Specia

Lexical Simplification Machine Translation +1

Paper
Add Code

SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain

no code implementations • LREC 2018 • Carolina Scarton, Gustavo Paetzold, Lucia Specia

Lexical Simplification Sentence +2

Paper
Add Code

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

1 code implementation • IJCNLP 2017 • Fern Alva-Manchego, o, Joachim Bingel, Gustavo Paetzold, Carolina Scarton, Lucia Specia

Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data.

Ranked #8 on Text Simplification on PWKP / WikiSmall (SARI metric)

Machine Translation Sentence +2

Paper
Code

MUSST: A Multilingual Syntactic Simplification Tool

no code implementations • IJCNLP 2017 • Carolina Scarton, Alessio Palmero Aprosio, Sara Tonelli, Tamara Mart{\'\i}n Wanton, Lucia Specia

Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications).

Lexical Simplification Sentence +1

Paper
Add Code

Bilexical Embeddings for Quality Estimation

no code implementations • WS 2017 • Fr{\'e}d{\'e}ric Blain, Carolina Scarton, Lucia Specia

Language Modelling Machine Translation +1

Paper
Add Code

Improving Evaluation of Document-level Machine Translation Quality Estimation

no code implementations • EACL 2017 • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.

Document Level Machine Translation Machine Translation +2