no code implementations • VarDial (COLING) 2020 • Fernando Benites, Manuela Hürlimann, Pius von Däniken, Mark Cieliebak
We describe our approaches for the Social Media Geolocation (SMG) task at the VarDial Evaluation Campaign 2020.
no code implementations • 4 Dec 2024 • Pius von Däniken, Jan Deriu, Mark Cieliebak
Automated metrics for Machine Translation have made significant progress, with the goal of replacing expensive and time-consuming human evaluations.
no code implementations • 3 Jun 2024 • Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak
Thus, we propose that preference-based metrics ought to be evaluated on both sign accuracy scores and favoritism.
no code implementations • 6 Jun 2023 • Jan Deriu, Pius von Däniken, Don Tuggener, Mark Cieliebak
A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreement with human judgments.
no code implementations • 24 Oct 2022 • Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak
A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns.
1 code implementation • ACL 2022 • Jan Deriu, Don Tuggener, Pius von Däniken, Mark Cieliebak
This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems.
1 code implementation • EMNLP 2020 • Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak
In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots.