no code implementations • ACL 2022 • Emily Dinan, Gavin Abercrombie, A. Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser
We then empirically assess the extent to which current tools can measure these effects and current systems display them.
no code implementations • INLG (ACL) 2020 • David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser
Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.
no code implementations • EMNLP 2021 • David M. Howcroft, Verena Rieser
Previous work has shown that human evaluations in NLP are notoriously under-powered.
no code implementations • EMNLP 2021 • Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser
We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems.
no code implementations • GeBNLP (COLING) 2020 • Amanda Cercas Curry, Judy Robertson, Verena Rieser
We then outline a multi-disciplinary project of how we plan to address the complex question of gender and stereotyping in digital assistants.
no code implementations • 18 Mar 2022 • Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
1 code implementation • Findings (EMNLP) 2021 • Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas
We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles.
1 code implementation • 20 Sep 2021 • Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser
We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems.
no code implementations • 7 Jul 2021 • Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser
Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans.
1 code implementation • ACL 2021 • Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas
We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation.
1 code implementation • ACL (GeBNLP) 2021 • Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya, Verena Rieser
Technology companies have produced varied responses to concerns about the effects of the design of their conversational AI systems.
1 code implementation • ACL 2021 • Karin Sevegnani, David M. Howcroft, Ioannis Konstas, Verena Rieser
Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics.
1 code implementation • EMNLP 2020 • Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications.
no code implementations • ACL 2020 • Xinnuo Xu, Ond{\v{r}}ej Du{\v{s}}ek, Jingyi Li, Verena Rieser, Ioannis Konstas
Abstractive summarisation is notoriously hard to evaluate since standard word-overlap-based metrics are insufficient.
2 code implementations • ACL 2020 • Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser
Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.
1 code implementation • WS 2019 • Ondřej Dušek, David M. Howcroft, Verena Rieser
Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.
Ranked #3 on
Data-to-Text Generation
on Cleaned E2E NLG Challenge
1 code implementation • WS 2019 • Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser
We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs.
1 code implementation • WS 2019 • Amanda Cercas Curry, Verena Rieser
How should conversational agents respond to verbal abuse through the user?
1 code implementation • WS 2019 • Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser
We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager.
5 code implementations • 13 Mar 2019 • Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.
no code implementations • 23 Jan 2019 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.
1 code implementation • WS 2018 • Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system.
1 code implementation • WS 2018 • Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database.
1 code implementation • WS 2018 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems.
Ranked #4 on
Data-to-Text Generation
on E2E NLG Challenge
1 code implementation • EMNLP 2018 • Xinnuo Xu, Ond{\v{r}}ej Du{\v{s}}ek, Ioannis Konstas, Verena Rieser
We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.
2 code implementations • 18 Sep 2018 • Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser
We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.
no code implementations • WS 2018 • Am Cercas Curry, a, Verena Rieser
In this article, we establish how current state-of-the-art conversational systems react to inappropriate requests, such as bullying and sexual harassment on the part of the user, by collecting and analysing the novel {\#}MeTooAlexa corpus.
1 code implementation • 31 Mar 2018 • Simon Keizer, Verena Rieser
Recent statistical approaches have improved the robustness and scalability of spoken dialogue systems.
1 code implementation • NAACL 2018 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings.
no code implementations • 20 Dec 2017 • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, Oliver Lemon
Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence.
no code implementations • 13 Sep 2017 • Amanda Cercas Curry, Helen Hastie, Verena Rieser
In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success.
1 code implementation • 5 Aug 2017 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output.
1 code implementation • EMNLP 2017 • Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser
The majority of NLG evaluation relies on automatic metrics, such as BLEU .
1 code implementation • WS 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.
no code implementations • 28 Jun 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora.
no code implementations • 1 Aug 2016 • Jekaterina Novikova, Oliver Lemon, Verena Rieser
Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances.
no code implementations • 15 Jun 2016 • Verena Rieser, Oliver Lemon
We present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e. g. a user and a surface realiser).
no code implementations • ACL 2016 • Dimitra Gkatzia, Oliver Lemon, Verena Rieser
Decision-making is often dependent on uncertain data, e. g. data associated with confidence scores or probabilities.
no code implementations • LREC 2016 • Phil Bartie, William Mackaness, Dimitra Gkatzia, Verena Rieser
Our interest is in people{'}s capacity to efficiently and effectively describe geographic objects in urban scenes.
no code implementations • WS 2014 • Helen Hastie, Marie-Aude Aufaure, Panos Alexopoulos, Hugues Bouchard, Catherine Breslin, Heriberto Cuay{\'a}huitl, Nina Dethlefs, Milica Ga{\v{s}}i{\'c}, James Henderson, Oliver Lemon, Xingkun Liu, Peter Mika, Nesrine Ben Mustapha, Tim Potter, Verena Rieser, Blaise Thomson, Pirros Tsiakoulis, Yves Vanrompay, Boris Villazon-Terrazas, Majid Yazdani, Steve Young, Yanchao Yu
no code implementations • LREC 2014 • Eshrag Refaee, Verena Rieser
We present a newly collected data set of 8, 868 gold-standard annotated Arabic feeds.
no code implementations • WS 2013 • Helen Hastie, Marie-Aude Aufaure, Panos Alexopoulos, Heriberto Cuay{\'a}huitl, Nina Dethlefs, Milica Gasic, James Henderson, Oliver Lemon, Xingkun Liu, Peter Mika, Nesrine Ben Mustapha, Verena Rieser, Blaise Thomson, Pirros Tsiakoulis, Yves Vanrompay