no code implementations • SIGDIAL (ACL) 2022 • Alessandro Suglia, Bhathiya Hemanthage, Malvina Nikandrou, George Pantazopoulos, Amit Parekh, Arash Eshghi, Claudio Greco, Ioannis Konstas, Oliver Lemon, Verena Rieser
We demonstrate EMMA, an embodied multimodal agent which has been developed for the Alexa Prize SimBot challenge.
no code implementations • EMNLP 2021 • Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser
We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems.
no code implementations • SIGDIAL (ACL) 2022 • A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau, Verena Rieser
Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, open-domain conversations with humans.
no code implementations • GeBNLP (COLING) 2020 • Amanda Cercas Curry, Judy Robertson, Verena Rieser
We then outline a multi-disciplinary project of how we plan to address the complex question of gender and stereotyping in digital assistants.
no code implementations • EMNLP 2021 • David M. Howcroft, Verena Rieser
Previous work has shown that human evaluations in NLP are notoriously under-powered.
no code implementations • ACL 2022 • Emily Dinan, Gavin Abercrombie, A. Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser
We then empirically assess the extent to which current tools can measure these effects and current systems display them.
no code implementations • INLG (ACL) 2020 • David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser
Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.
no code implementations • 7 Nov 2023 • Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia
Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation.
no code implementations • 18 Oct 2023 • Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, William Isaac
First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.
no code implementations • 29 Aug 2023 • Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson, Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan, Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, Oliver Lemon
We demonstrate an embodied conversational agent that can function as a receptionist and generate a mixture of open and closed-domain dialogue along with facial expressions, by using a large language model (LLM) to develop an engaging conversation.
no code implementations • 1 Jul 2023 • Yi-Ling Chung, Gavin Abercrombie, Florence Enock, Jonathan Bright, Verena Rieser
Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse.
no code implementations • 25 May 2023 • Sabrina Chiesurin, Dimitris Dimakopoulos, Marco Antonio Sobrevilla Cabezudo, Arash Eshghi, Ioannis Papaioannou, Verena Rieser, Ioannis Konstas
Large language models are known to produce output which sounds fluent and convincing, but is also often wrong, e. g. "unfaithful" with respect to a rationale as retrieved from a knowledge base.
Conversational Question Answering
Open-Domain Question Answering
1 code implementation • 16 May 2023 • Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, Zeerak Talat
In this paper, we discuss the linguistic factors that contribute to the anthropomorphism of dialogue systems and the harms that can arise, including reinforcing gender stereotypes and notions of acceptable language.
no code implementations • 10 May 2023 • Nikolas Vitsakis, Amit Parekh, Tanvi Dinkar, Gavin Abercrombie, Ioannis Konstas, Verena Rieser
There are two competing approaches for modelling annotator disagreement: distributional soft-labelling approaches (which aim to capture the level of disagreement) or modelling perspectives of individual annotators or groups thereof.
no code implementations • 6 May 2023 • Marco Casadio, Luca Arnaboldi, Matthew L. Daggitt, Omri Isac, Tanvi Dinkar, Daniel Kienitz, Verena Rieser, Ekaterina Komendantskaya
In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP.
no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.
no code implementations • 28 Apr 2023 • Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio
We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on NLP, instead of both NLP and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements-as training with aggregated labels for subjective NLP tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation.
no code implementations • 28 Apr 2023 • Lu Yu, Malvina Nikandrou, Jiali Jin, Verena Rieser
In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people.
no code implementations • 25 Jan 2023 • Gavin Abercrombie, Verena Rieser, Dirk Hovy
We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks.
3 code implementations • 9 Nov 2022 • BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.
1 code implementation • 8 Nov 2022 • Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser
As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding.
1 code implementation • 2 Oct 2022 • Gavin Abercrombie, Verena Rieser
While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses.
no code implementations • 30 Sep 2022 • Mavina Nikandrou, Lu Yu, Alessandro Suglia, Ioannis Konstas, Verena Rieser
We first propose three plausible task formulations and demonstrate their impact on the performance of continual learning algorithms.
no code implementations • 6 Jul 2022 • Lu Yu, Verena Rieser
This study is the first to investigate the robustness of visually grounded dialog models towards textual attacks.
no code implementations • 21 Jun 2022 • Marco Casadio, Ekaterina Komendantskaya, Verena Rieser, Matthew L. Daggitt, Daniel Kienitz, Luca Arnaboldi, Wen Kokke
With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations.
no code implementations • 18 Mar 2022 • Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
1 code implementation • Findings (EMNLP) 2021 • Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas
We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles.
1 code implementation • 20 Sep 2021 • Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser
We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems.
no code implementations • 7 Jul 2021 • Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser
Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans.
1 code implementation • ACL 2021 • Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas
We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation.
1 code implementation • ACL (GeBNLP) 2021 • Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya, Verena Rieser
Technology companies have produced varied responses to concerns about the effects of the design of their conversational AI systems.
1 code implementation • ACL 2021 • Karin Sevegnani, David M. Howcroft, Ioannis Konstas, Verena Rieser
Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics.
1 code implementation • EMNLP 2020 • Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications.
Ranked #3 on
Slot Filling
on SLURP
(using extra training data)
no code implementations • ACL 2020 • Xinnuo Xu, Ond{\v{r}}ej Du{\v{s}}ek, Jingyi Li, Verena Rieser, Ioannis Konstas
Abstractive summarisation is notoriously hard to evaluate since standard word-overlap-based metrics are insufficient.
2 code implementations • ACL 2020 • Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser
Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.
1 code implementation • WS 2019 • Ondřej Dušek, David M. Howcroft, Verena Rieser
Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.
Ranked #3 on
Data-to-Text Generation
on Cleaned E2E NLG Challenge
1 code implementation • WS 2019 • Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser
We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs.
1 code implementation • WS 2019 • Amanda Cercas Curry, Verena Rieser
How should conversational agents respond to verbal abuse through the user?
no code implementations • WS 2019 • Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser
We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager.
8 code implementations • 13 Mar 2019 • Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.
no code implementations • 23 Jan 2019 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.
1 code implementation • WS 2018 • Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system.
1 code implementation • WS 2018 • Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database.
1 code implementation • WS 2018 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems.
Ranked #4 on
Data-to-Text Generation
on E2E NLG Challenge
1 code implementation • EMNLP 2018 • Xinnuo Xu, Ond{\v{r}}ej Du{\v{s}}ek, Ioannis Konstas, Verena Rieser
We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.
2 code implementations • 18 Sep 2018 • Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser
We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.
no code implementations • WS 2018 • Am Cercas Curry, a, Verena Rieser
In this article, we establish how current state-of-the-art conversational systems react to inappropriate requests, such as bullying and sexual harassment on the part of the user, by collecting and analysing the novel {\#}MeTooAlexa corpus.
1 code implementation • 31 Mar 2018 • Simon Keizer, Verena Rieser
Recent statistical approaches have improved the robustness and scalability of spoken dialogue systems.
1 code implementation • NAACL 2018 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings.
no code implementations • 20 Dec 2017 • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, Oliver Lemon
Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence.
no code implementations • 13 Sep 2017 • Amanda Cercas Curry, Helen Hastie, Verena Rieser
In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success.
1 code implementation • 5 Aug 2017 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output.
1 code implementation • EMNLP 2017 • Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser
The majority of NLG evaluation relies on automatic metrics, such as BLEU .
no code implementations • 28 Jun 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora.
2 code implementations • WS 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.
no code implementations • 1 Aug 2016 • Jekaterina Novikova, Oliver Lemon, Verena Rieser
Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances.
no code implementations • 15 Jun 2016 • Verena Rieser, Oliver Lemon
We present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e. g. a user and a surface realiser).
no code implementations • ACL 2016 • Dimitra Gkatzia, Oliver Lemon, Verena Rieser
Decision-making is often dependent on uncertain data, e. g. data associated with confidence scores or probabilities.
no code implementations • LREC 2016 • Phil Bartie, William Mackaness, Dimitra Gkatzia, Verena Rieser
Our interest is in people{'}s capacity to efficiently and effectively describe geographic objects in urban scenes.
no code implementations • WS 2014 • Helen Hastie, Marie-Aude Aufaure, Panos Alexopoulos, Hugues Bouchard, Catherine Breslin, Heriberto Cuay{\'a}huitl, Nina Dethlefs, Milica Ga{\v{s}}i{\'c}, James Henderson, Oliver Lemon, Xingkun Liu, Peter Mika, Nesrine Ben Mustapha, Tim Potter, Verena Rieser, Blaise Thomson, Pirros Tsiakoulis, Yves Vanrompay, Boris Villazon-Terrazas, Majid Yazdani, Steve Young, Yanchao Yu
no code implementations • LREC 2014 • Eshrag Refaee, Verena Rieser
We present a newly collected data set of 8, 868 gold-standard annotated Arabic feeds.
no code implementations • WS 2013 • Helen Hastie, Marie-Aude Aufaure, Panos Alexopoulos, Heriberto Cuay{\'a}huitl, Nina Dethlefs, Milica Gasic, James Henderson, Oliver Lemon, Xingkun Liu, Peter Mika, Nesrine Ben Mustapha, Verena Rieser, Blaise Thomson, Pirros Tsiakoulis, Yves Vanrompay