no code implementations • EMNLP 2020 • Yvette Graham, Barry Haddow, Philipp Koehn
In addition, we provide a re-evaluation of a past machine translation evaluation claiming human-parity of MT.
no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
1 code implementation • MSR (COLING) 2020 • Simon Mille, Anya Belz, Bernd Bohnet, Thiago castro Ferreira, Yvette Graham, Leo Wanner
As in SR’18 and SR’19, the shared task comprised two tracks: (1) a Shallow Track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (2) a Deep Track where additionally, functional words and morphological information were removed.
no code implementations • 9 Feb 2024 • Yvette Graham, Mohammed Rameez Qureshi, Haider Khalid, Gerasimos Lampouras, Ignacio Iacobacci, Qun Liu
The aim of this workshop is to bring together experts working on open-domain dialogue research.
no code implementations • 6 Nov 2023 • Longyue Wang, Zhaopeng Tu, Yan Gu, Siyou Liu, Dian Yu, Qingsong Ma, Chenyang Lyu, Liting Zhou, Chao-Hong Liu, Yufeng Ma, WeiYu Chen, Yvette Graham, Bonnie Webber, Philipp Koehn, Andy Way, Yulin Yuan, Shuming Shi
To foster progress in this domain, we hold a new shared task at WMT 2023, the first edition of the Discourse-Level Literary Translation.
1 code implementation • 24 Oct 2023 • Alan Cowap, Yvette Graham, Jennifer Foster
Recent developments in generative AI have shone a spotlight on high-performance synthetic text generation technologies.
no code implementations • 22 Jun 2023 • George Awad, Keith Curtis, Asad Butt, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Eliot Godard, Lukas Diduch, Jeffrey Liu, Yvette Graham, Georges Quenot
The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology.
no code implementations • 16 May 2023 • Chenyang Lyu, Tianbo Ji, Yvette Graham, Jennifer Foster
We show that by integrating our approach into VideoQA systems we can achieve comparable, even superior, performance with a significant speed up for training and inference.
no code implementations • 14 May 2023 • Chenyang Lyu, Tianbo Ji, Yvette Graham, Jennifer Foster
Specifically, we explicitly use the Semantic Role Labeling (SRL) structure of the question in the dynamic reasoning process where we decide to move to the next frame based on which part of the SRL structure (agent, verb, patient, etc.)
no code implementations • 17 Dec 2022 • Chenyang Lyu, Linyi Yang, Yue Zhang, Yvette Graham, Jennifer Foster
User and product information associated with a review is useful for sentiment polarity prediction.
no code implementations • 9 Oct 2022 • Tianbo Ji, Chenyang Lyu, Gareth Jones, Liting Zhou, Yvette Graham
Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage.
1 code implementation • insights (ACL) 2022 • Chenyang Lyu, Jennifer Foster, Yvette Graham
Past works that investigate out-of-domain performance of QA systems have mainly focused on general domains (e. g. news domain, wikipedia domain), underestimating the importance of subdomains defined by the internal characteristics of QA datasets.
1 code implementation • ACL 2022 • Tianbo Ji, Yvette Graham, Gareth J. F. Jones, Chenyang Lyu, Qun Liu
Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost.
1 code implementation • LREC 2022 • Luis Lebron, Yvette Graham, Kevin McGuinness, Konstantinos Kouramas, Noel E. O'Connor
The model is based on BERT, which is a language model that has been shown to work well in multiple NLP tasks.
no code implementations • EMNLP 2021 • Chenyang Lyu, Lifeng Shang, Yvette Graham, Jennifer Foster, Xin Jiang, Qun Liu
Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer.
no code implementations • 27 Apr 2021 • George Awad, Asad A. Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quenot
In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1.
no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri
In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.
1 code implementation • COLING 2020 • Chenyang Lyu, Jennifer Foster, Yvette Graham
We achieve this by explicitly storing representations of reviews written by the same user and about the same product and force the model to memorize all reviews for one particular user and product.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Yvette Graham, Christian Federmann, Maria Eskevich, Barry Haddow
Recent machine translation shared tasks have shown top-performing systems to tie or in some cases even outperform human translation.
no code implementations • 21 Sep 2020 • George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quenot
The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation.
no code implementations • WS 2019 • Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, Leo Wanner
We report results from the SR{'}19 Shared Task, the second edition of a multilingual surface realisation task organised as part of the EMNLP{'}19 Workshop on Multilingual Surface Realisation.
no code implementations • WS 2019 • Qingsong Ma, Johnny Wei, Ond{\v{r}}ej Bojar, Yvette Graham
This paper presents the results of the WMT19 Metrics Shared Task.
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
no code implementations • 24 Jun 2019 • Yvette Graham, Barry Haddow, Philipp Koehn
Finally, we provide a comprehensive check-list for future machine translation evaluation.
no code implementations • WS 2018 • Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018.
no code implementations • WS 2018 • Qingsong Ma, Ond{\v{r}}ej Bojar, Yvette Graham
We asked participants of this task to score the outputs of the MT systems involved in the WMT18 News Translation Task with automatic metrics.
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
no code implementations • WS 2018 • Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, Emily Pitler, Leo Wanner
We report results from the SR{'}18 Shared Task, a new multilingual surface realisation task organised as part of the ACL{'}18 Workshop on Multilingual Surface Realisation.
1 code implementation • 10 Jan 2018 • Long-Yue Wang, Zhaopeng Tu, Shuming Shi, Tong Zhang, Yvette Graham, Qun Liu
Next, the annotated source sentence is reconstructed from hidden representations in the NMT model.
no code implementations • 29 Oct 2017 • Yvette Graham, George Awad, Alan Smeaton
We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video.
no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi
1 code implementation • EMNLP 2017 • Qingsong Ma, Yvette Graham, Timothy Baldwin, Qun Liu
Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators.
no code implementations • EACL 2017 • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton
Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.
no code implementations • COLING 2016 • Yvette Graham, Timothy Baldwin, Meghan Dowling, Maria Eskevich, Teresa Lynn, Lamia Tounsi
Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri