no code implementations • 20 Aug 2024 • John Mendonça, Isabel Trancoso, Alon Lavie
Although human evaluation remains the gold standard for open-domain dialogue evaluation, the growing popularity of automated evaluation using Large Language Models (LLMs) has also extended to dialogue.
1 code implementation • 16 Jul 2024 • John Mendonça, Isabel Trancoso, Alon Lavie
Motivated by the need for lightweight, open source, and multilingual dialogue evaluators, this paper introduces GenResCoh (Generated Responses targeting Coherence).
no code implementations • 4 Jul 2024 • John Mendonça, Alon Lavie, Isabel Trancoso
Large Language Models (LLMs) have showcased remarkable capabilities in various Natural Language Processing tasks.
1 code implementation • 23 Nov 2023 • John Mendonça, Patrícia Pereira, Miguel Menezes, Vera Cabarrão, Ana C. Farinha, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso
Task-oriented conversational datasets often lack topic variability and linguistic diversity.
1 code implementation • 31 Aug 2023 • John Mendonça, Patrícia Pereira, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso
Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English.
1 code implementation • 31 Aug 2023 • John Mendonça, Alon Lavie, Isabel Trancoso
The main limiting factor in the development of robust multilingual dialogue evaluation metrics is the lack of multilingual data and the limited availability of open sourced multilingual dialogue systems.
no code implementations • 30 Jun 2021 • John Mendonça, Rubén Solera-Ureña, Alberto Abad, Isabel Trancoso
Experimental results demonstrate that models trained on features extracted from self-supervised models perform similarly or outperform fully-supervised models and models based on handcrafted features.