Search Results for author: Carolina Scarton

Found 35 papers, 9 papers with code

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

1 code implementation9 Sep 2021 Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio

Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.

Language Modelling

Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels

1 code implementation ACL 2021 Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the Noun Compound Type and Token Idiomaticity (NCTTI) dataset, with human annotations for 280 noun compounds in English and 180 in Portuguese at both type and token level.

Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic

no code implementations22 Jun 2021 Ye Jiang, Xingyi Song, Carolina Scarton, Ahmet Aker, Kalina Bontcheva

In this paper, we introduce a fine-grained annotated misinformation tweets dataset including social behaviours annotation (e. g. comment or question to the misinformation).


Probing for idiomaticity in vector space models

no code implementations EACL 2021 Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio

Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language.

Multistage BiCross encoder for multilingual access to COVID-19 health information

1 code implementation8 Jan 2021 Iknoor Singh, Carolina Scarton, Kalina Bontcheva

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online.

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

1 code implementation ACL 2020 Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia

Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.

Data-Driven Sentence Simplification: Survey and Benchmark

no code implementations CL 2020 Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia

Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand.

Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality

1 code implementation14 Oct 2019 Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis, Lucia Specia

To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort.

Machine Translation

EASSE: Easier Automatic Sentence Simplification Evaluation

1 code implementation IJCNLP 2019 Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia

We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems.

Sheffield Submissions for WMT18 Multimodal Translation Shared Task

no code implementations WS 2018 Chiraag Lala, Pranava Swaroop Madhyastha, Carolina Scarton, Lucia Specia

For task 1b, we explore three approaches: (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.

Data Augmentation Multimodal Machine Translation +2

Learning Simplifications for Specific Target Audiences

no code implementations ACL 2018 Carolina Scarton, Lucia Specia

Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text.

Lexical Simplification Machine Translation +2

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

1 code implementation IJCNLP 2017 Fern Alva-Manchego, o, Joachim Bingel, Gustavo Paetzold, Carolina Scarton, Lucia Specia

Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data.

Machine Translation Sentence Compression +1

MUSST: A Multilingual Syntactic Simplification Tool

no code implementations IJCNLP 2017 Carolina Scarton, Alessio Palmero Aprosio, Sara Tonelli, Tamara Mart{\'\i}n Wanton, Lucia Specia

Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications).

Lexical Simplification Text Simplification

Improving Evaluation of Document-level Machine Translation Quality Estimation

no code implementations EACL 2017 Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.

Document-level Document Level Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.