no code implementations • EMNLP (newsum) 2021 • Don Tuggener, Margot Mieskes, Jan Deriu, Mark Cieliebak
Dialogue summarization is a long-standing task in the field of NLP, and several data sets with dialogues and associated human-written summaries of different styles exist.
no code implementations • 4 Dec 2024 • Pius von Däniken, Jan Deriu, Mark Cieliebak
Automated metrics for Machine Translation have made significant progress, with the goal of replacing expensive and time-consuming human evaluations.
1 code implementation • 5 Jun 2024 • Janick Michot, Manuela Hürlimann, Jan Deriu, Luzia Sauer, Katsiaryna Mlynchyk, Mark Cieliebak
In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their errors.
no code implementations • 3 Jun 2024 • Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak
Thus, we propose that preference-based metrics ought to be evaluated on both sign accuracy scores and favoritism.
no code implementations • 13 Oct 2023 • Claudio Paonessa, Yanick Schraner, Jan Deriu, Manuela Hürlimann, Manfred Vogel, Mark Cieliebak
This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German.
no code implementations • 7 Jun 2023 • Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Georgia Koutrika, Kurt Stockinger
Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data.
no code implementations • 6 Jun 2023 • Jan Deriu, Pius von Däniken, Don Tuggener, Mark Cieliebak
A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreement with human judgments.
no code implementations • 31 May 2023 • Tobias Bollinger, Jan Deriu, Manfred Vogel
In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models.
no code implementations • 30 May 2023 • Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak
We train an ASR model on the training set and achieve an average BLEU score of 74. 7 on the test set.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 24 Oct 2022 • Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak
A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns.
1 code implementation • LREC 2022 • Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel
We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers.
no code implementations • 18 Mar 2022 • Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
1 code implementation • ACL 2022 • Jan Deriu, Don Tuggener, Pius von Däniken, Mark Cieliebak
This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems.
1 code implementation • EMNLP 2020 • Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak
In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots.
no code implementations • ACL 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.
no code implementations • 4 May 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.
no code implementations • ACL 2020 • Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak
For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT).
no code implementations • WS 2019 • Jan Deriu, Mark Cieliebak
We present "AutoJudge", an automated evaluation method for conversational dialogue systems.
no code implementations • 10 May 2019 • Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, Mark Cieliebak
We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
no code implementations • COLING 2018 • Fern Benites, o, Ralf Grubenmann, Pius von D{\"a}niken, Dirk von Gr{\"u}nigen, Jan Deriu, Mark Cieliebak
We describe our approaches used in the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2018.
no code implementations • SEMEVAL 2017 • Simon M{\"u}ller, Tobias Huonder, Jan Deriu, Mark Cieliebak
In this paper, we propose a classifier for predicting topic-specific sentiments of English Twitter messages.
1 code implementation • 7 Mar 2017 • Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
This paper presents a novel approach for multi-lingual sentiment classification in short texts.
no code implementations • 11 Nov 2015 • Jan Deriu, Rolf Jagerman, Kai-En Tsay
The problem of inpainting involves reconstructing the missing areas of an image.