no code implementations • WMT (EMNLP) 2021 • Lucia Specia, Frédéric Blain, Marina Fomicheva, Chrysoula Zerva, Zhenhao Li, Vishrav Chaudhary, André F. T. Martins
We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels.
no code implementations • WMT (EMNLP) 2021 • Chrysoula Zerva, Daan van Stigt, Ricardo Rei, Ana C Farinha, Pedro Ramos, José G. C. de Souza, Taisiya Glushkova, Miguel Vera, Fabio Kepler, André F. T. Martins
We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation.
1 code implementation • WMT (EMNLP) 2021 • Ricardo Rei, Ana C Farinha, Chrysoula Zerva, Daan van Stigt, Craig Stewart, Pedro Ramos, Taisiya Glushkova, André F. T. Martins, Alon Lavie
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics Shared Task.
1 code implementation • 18 Feb 2025 • Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, Maarten Sap
Preference alignment via reward models helps build safe, helpful, and reliable large language models (LLMs).
1 code implementation • 10 Feb 2025 • Gonçalo Gomes, Chrysoula Zerva, Bruno Martins
The evaluation of image captions, looking at both linguistic fluency and semantic correspondence to visual contents, has witnessed a significant effort.
1 code implementation • 20 Sep 2024 • Konstantinos Thomas, Giorgos Filandrianos, Maria Lymperaiou, Chrysoula Zerva, Giorgos Stamou
Equivocation and ambiguity in public speech are well-studied discourse phenomena, especially in political science and analysis of political interviews.
no code implementations • 3 May 2024 • Margarida M. Campos, António Farinhas, Chrysoula Zerva, Mário A. T. Figueiredo, André F. T. Martins
The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications.
1 code implementation • 1 Feb 2024 • Dennis Ulmer, Chrysoula Zerva, André F. T. Martins
Conformal prediction is an attractive framework to provide predictions imbued with statistical guarantees, however, its application to text generation is challenging since any i. i. d.
1 code implementation • 20 Nov 2023 • Sumire Honda, Patrick Fernandes, Chrysoula Zerva
We make use of Conditional Cross-Mutual Information (CXMI) to explore how much of the context the model uses and generalise CXMI to study the impact of the extra-sentential context.
1 code implementation • 2 Oct 2023 • António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins
Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth.
no code implementations • 28 Jul 2023 • Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications.
no code implementations • 9 Jun 2023 • Chrysoula Zerva, André F. T. Martins
Several uncertainty estimation methods have been recently proposed for machine translation evaluation.
1 code implementation • 30 May 2023 • Taisiya Glushkova, Chrysoula Zerva, André F. T. Martins
Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers.
1 code implementation • 26 May 2023 • Giorgos Filandrianos, Edmund Dervakos, Orfeas Menis-Mastromichalakis, Chrysoula Zerva, Giorgos Stamou
We propose a new back translation-inspired evaluation methodology that utilises earlier outputs of the explainer as ground truth proxies to investigate the consistency of explainers.
1 code implementation • 13 Sep 2022 • Ricardo Rei, Marcos Treviso, Nuno M. Guerreiro, Chrysoula Zerva, Ana C. Farinha, Christine Maroti, José G. C. de Souza, Taisiya Glushkova, Duarte M. Alves, Alon Lavie, Luisa Coheur, André F. T. Martins
We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE).
1 code implementation • 13 Apr 2022 • Chrysoula Zerva, Taisiya Glushkova, Ricardo Rei, André F. T. Martins
Trainable evaluation metrics for machine translation (MT) exhibit strong correlation with human judgements, but they are often hard to interpret and might produce unreliable scores under noisy or out-of-domain data.
1 code implementation • ACL 2022 • Jake Vasilakes, Chrysoula Zerva, Makoto Miwa, Sophia Ananiadou
Negation and uncertainty modeling are long-standing tasks in natural language processing.
2 code implementations • Findings (EMNLP) 2021 • Taisiya Glushkova, Chrysoula Zerva, Ricardo Rei, André F. T. Martins
Several neural-based metrics have been recently proposed to evaluate machine translation quality.
no code implementations • AAAI-MAKE 2021 • Edmund Dervakos, Giorgos Filandrianos, Konstantinos Thomas, Alexios Mandalios, Chrysoula Zerva, Giorgos Stamou
The rapid growth of scientific literature in the biomedical and clinical domain has significantly com- plicated the identification of information of interest by researchers as well as other practitioners.
Ranked #1 on
Information Retrieval
on Ohsumed
1 code implementation • LREC 2022 • Marina Fomicheva, Shuo Sun, Erick Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia, André F. T. Martins
We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE).
no code implementations • WS 2018 • Chrysoula Zerva, Sophia Ananiadou
We compare the differences in the definition and expression of uncertainty between a scientific domain, i. e., biomedicine, and newswire.